qcircuitbench

API Endpoint
Leaderboard
Loading leaderboard...
Implementation of
README

QCircuitBench

⭐ OpenReward Environment

Description

QCircuitBench is an environment for evaluating an agent's ability to design and implement quantum algorithms. Agents are given quantum computing tasks (e.g., Bernstein-Vazirani, Grover's search, Shor's algorithm) and must produce correct quantum circuit implementations in OpenQASM 3.0 with Qiskit-based post-processing code.

This OpenReward implementation is ported from the Harbor Framework implementation originally made by Estel Yang.

Capabilities

  • Designing quantum circuits for standard algorithms
  • Implementing circuits in OpenQASM 3.0
  • Writing post-processing code with Qiskit and AerSimulator
  • Debugging and testing quantum programs

Compute Requirements

Agents are given a sandboxed environment with bash access, file editing tools, and a Qiskit runtime. Sandbox size is 1 CPU and 2 GB RAM.

License

CC BY 4.0.

Tasks

There is one split in this environment:

  • Test: 28 quantum circuit tasks

Tasks cover algorithms including Bernstein-Vazirani, Deutsch-Jozsa, Grover's search, Shor's factoring, quantum Fourier transform, Simon's algorithm, and others, at varying qubit sizes.

Reward Structure

This is a multi-turn, sandbox-based environment. The agent writes a solution.py file containing OpenQASM 3.0 code, then calls submit_answer to trigger verification. The verifier parses the agent's quantum circuit and computes state fidelity against a ground truth circuit using Qiskit's state_fidelity function.

  • 1.0: The agent's circuit produces a quantum state identical to the expected state.
  • 0.0-1.0: Partial credit based on state fidelity between the agent's output and the ground truth.
  • 0.0: Invalid QASM syntax, missing solution, or circuit produces incorrect state.

Data

Each task directory contains an instruction.md with the problem specification and a tests/ directory with verification scripts. Task data is stored on the OpenReward platform.

Tools

ToolDescription
bashExecute shell commands in the sandbox.
str_replaceReplace a unique string in a file.
viewView file contents or list directory contents.
create_fileCreate a new file with specified content.
submit_answerSubmit work for automated verification. Triggers test execution and returns reward.

Time Horizon

QCircuitBench is a multi-turn environment. Agents read task instructions, write quantum circuits and post-processing code, test their solutions, and submit for verification.

Environment Difficulty

QCircuitBench is a challenging benchmark. The original paper evaluates LLMs on quantum algorithm design and finds that semantic correctness (producing functionally correct circuits) remains difficult even for frontier models:

ModelQASM Syntax (5-shot)Semantic Correctness (5-shot)
GPT-4o0.5780.201
Llama3-8B0.4600.032
Qwen 2.50.4310.100
DeepSeek-R10.1770.010
Human baseline0.6860.137

LLMs exhibit consistent error patterns in quantum algorithm design, and fine-tuning does not always outperform few-shot learning.

Other Environment Requirements

There are no further environment requirements; QCircuitBench works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in QCircuitBench write and execute quantum computing code in a sandboxed environment. The environment does not present direct safety risks.

Citations

@inproceedings{yang2024qcircuitbench,
  author    = {Yang, Rui and Wang, Ziruo and Gu, Yuntian and Chen, Tianyi and Liang, Yitao and Li, Tongyang},
  title     = {QCircuitBench: A Large-Scale Dataset for Benchmarking Quantum Algorithm Design},
  booktitle = {NeurIPS 2025 Datasets and Benchmarks Track},
  year      = {2024},
  url       = {https://arxiv.org/abs/2410.07961}
}
GeneralReasoning/qcircuitbench | OpenReward