API Endpoint

Leaderboard

Loading leaderboard...

Implementation of

arXiv/superchem

README

SUPERChem

Description

SUPERChem is an environment for evaluating multimodal chemistry reasoning with 500 expert-curated problems. Questions feature molecular structure images and cover four core domains: Structure and Properties, Reaction and Synthesis, Principles and Calculations, and Experimental Design and Analysis. Answers are multiple choice (A-H).

Capabilities

Multimodal chemistry reasoning with molecular structure images
Multiple-choice evaluation across 4 chemistry domains
Expert-curated questions from non-public examinations

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

MIT.

Tasks

There is one split in this environment:

test: 500 tasks

Questions span four chemistry domains: Structure and Properties, Reaction and Synthesis, Principles and Calculations, and Experimental Design and Analysis.

Reward Structure

Single-turn evaluation with deterministic grading. The agent submits a single letter answer (A-H) via the submit_answer tool. The submitted answer is compared via exact match against the ground truth. Reward is 1.0 if correct, 0.0 if incorrect.

Data

SUPERChem-500.parquet (40.5 MB, 500 problems) sourced from HuggingFace ZehuaZhao/SUPERChem. Stored on the OpenReward platform.

Tools

Tool	Description
`submit_answer`	Submit a single letter answer (A-H). Deterministic evaluation via exact match. Ends the episode.

Time Horizon

Single-turn. The agent reads the multimodal chemistry question (text and molecular images) and submits one answer.

Environment Difficulty

SUPERChem evaluates multimodal chemistry reasoning at expert level:

Model	Accuracy
GPT-5 (High)	38.5%
Human (2nd-year chemistry majors)	40.3%

Frontier models struggle most in high-order reasoning tasks, particularly predicting product structures, elucidating reaction mechanisms, and analyzing structure-activity relationships.

Other Environment Requirements

There are no further environment requirements; SUPERChem works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in SUPERChem solve chemistry reasoning problems in a standard environment. The environment does not present direct safety risks.

Citation

@article{zhao2025superchem,
  title={SUPERChem: A Multimodal Reasoning Benchmark in Chemistry},
  author={Zhao, Zehua and Huang, Zhixian and Li, Junren and Lin, Siyu and Zhou, Junting and Cao, Fengqi and Zhou, Kun and Ge, Rui and Long, Tingting and Zhu, Yuexiang and Liu, Yan and Zheng, Jie and Wei, Junnian and Zhu, Rong and Zou, Peng and Li, Wenyu and Cheng, Zekai and Ding, Tian and Wang, Yaxuan and Yan, Yizhao and Wei, Tingru and Ming, Haowei and Mao, Weijie and Sun, Chen and Liu, Yiming and Wang, Zichen and Zhang, Zuo and Yang, Tong and Ma, Hao and Gao, Zhen and Pei, Jian},
  journal={arXiv preprint arXiv:2512.01274},
  year={2025}
}

Repository

Source repository

EnvCommons/SuperCHEM

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

SuperCHEM

GeneralReasoning/SuperCHEM

SUPERChem

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples