API Endpoint

Leaderboard

Loading leaderboard...

Implementation of

arXiv/mmcircuiteval

README

MMCircuitEval

Description

MMCircuitEval is the first multimodal benchmark for evaluating LLMs on Electronic Design Automation (EDA) tasks. It contains 3,614 question-answer pairs across 2,871 circuit problems, covering both digital and analog circuits. Questions span general knowledge, specifications, front-end design, and back-end design stages with circuit diagram images.

Capabilities

Multimodal circuit understanding with schematic images
Evaluation across 4 EDA workflow stages
Multiple question types: single-choice, multi-choice, fill-in-blank, open-ended

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

Tasks

Four splits in this environment:

general: 905 tasks (general circuit knowledge)
spec: 877 tasks (circuit specifications)
frontend: 922 tasks (front-end design)
backend: 910 tasks (back-end design)

Total: 3,614 question-answer pairs across 2,871 circuit problems.

Reward Structure

Single-turn evaluation with LLM-graded rewards. The agent submits an answer via the answer tool. Answers are graded by gpt-5-mini using question-type-specific prompts for single-choice, multi-choice, fill-in-blank, and open-ended questions. Reward is 1.0 if correct, 0.0 if incorrect.

Data

Four parquet files (~192 MB total) sourced from HuggingFace charlie314159/MMCircuitEval. Stored on the OpenReward platform.

Tools

Tool	Description
`answer`	Submit an answer. LLM-graded with question-type-specific evaluation. Ends the episode.

Time Horizon

Single-turn. The agent reads the multimodal circuit question (text and schematic images) and submits one answer.

Environment Difficulty

MMCircuitEval evaluates circuit understanding across digital and analog domains. Extensive evaluations reveal significant performance gaps among existing LLMs, particularly in back-end design and complex computations. Current models are generally underdeveloped in circuit and EDA fields compared to general-purpose tasks.

Other Environment Requirements

OpenAI API key required for LLM-based grading. Pass via secrets={"openai_api_key": "..."} when creating a session.

Safety

Agents in MMCircuitEval solve circuit analysis problems in a standard environment. The environment does not present direct safety risks.

Citation

@inproceedings{zhao2025mmcircuiteval,
  title={MMCircuitEval: A Comprehensive Multimodal Circuit-Focused Benchmark for Evaluating LLMs},
  author={Zhao, Chenchen and Shi, Zhengyuan and Wen, Xiangyu and Liu, Chengjie and Liu, Yi and Zhou, Yunhao and Zhao, Yuxiang and Feng, Hefei and Zhu, Yinan and Wan, Gwok-Waa and Cheng, Xin and Chen, Weiyu and Fu, Yongqi and Chen, Chujie and Xue, Chenhao and Sun, Guangyu and Wang, Ying and Lin, Yibo and Yang, Jun and Xu, Ning and Wang, Xi and Xu, Qiang},
  booktitle={ICCAD},
  year={2025}
}

Repository

Source repository

EnvCommons/MMCircuitEval

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

MMCircuitEval

GeneralReasoning/MMCircuitEval

MMCircuitEval

Description

Capabilities

Compute Requirements

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples