SuperCHEM
SUPERChem
Description
SUPERChem is an environment for evaluating multimodal chemistry reasoning with 500 expert-curated problems. Questions feature molecular structure images and cover four core domains: Structure and Properties, Reaction and Synthesis, Principles and Calculations, and Experimental Design and Analysis. Answers are multiple choice (A-H).
Capabilities
- Multimodal chemistry reasoning with molecular structure images
- Multiple-choice evaluation across 4 chemistry domains
- Expert-curated questions from non-public examinations
Compute Requirements
Agents are given a standard environment with no sandbox or file system access.
License
MIT.
Tasks
There is one split in this environment:
- test: 500 tasks
Questions span four chemistry domains: Structure and Properties, Reaction and Synthesis, Principles and Calculations, and Experimental Design and Analysis.
Reward Structure
Single-turn evaluation with deterministic grading. The agent submits a single letter answer (A-H) via the submit_answer tool. The submitted answer is compared via exact match against the ground truth. Reward is 1.0 if correct, 0.0 if incorrect.
Data
SUPERChem-500.parquet (40.5 MB, 500 problems) sourced from HuggingFace ZehuaZhao/SUPERChem. Stored on the OpenReward platform.
Tools
| Tool | Description |
|---|---|
submit_answer | Submit a single letter answer (A-H). Deterministic evaluation via exact match. Ends the episode. |
Time Horizon
Single-turn. The agent reads the multimodal chemistry question (text and molecular images) and submits one answer.
Environment Difficulty
SUPERChem evaluates multimodal chemistry reasoning at expert level:
| Model | Accuracy |
|---|---|
| GPT-5 (High) | 38.5% |
| Human (2nd-year chemistry majors) | 40.3% |
Frontier models struggle most in high-order reasoning tasks, particularly predicting product structures, elucidating reaction mechanisms, and analyzing structure-activity relationships.
Other Environment Requirements
There are no further environment requirements; SUPERChem works out of the box with the OpenReward endpoint without any external API keys.
Safety
Agents in SUPERChem solve chemistry reasoning problems in a standard environment. The environment does not present direct safety risks.
Citation
@article{zhao2025superchem,
title={SUPERChem: A Multimodal Reasoning Benchmark in Chemistry},
author={Zhao, Zehua and Huang, Zhixian and Li, Junren and Lin, Siyu and Zhou, Junting and Cao, Fengqi and Zhou, Kun and Ge, Rui and Long, Tingting and Zhu, Yuexiang and Liu, Yan and Zheng, Jie and Wei, Junnian and Zhu, Rong and Zou, Peng and Li, Wenyu and Cheng, Zekai and Ding, Tian and Wang, Yaxuan and Yan, Yizhao and Wei, Tingru and Ming, Haowei and Mao, Weijie and Sun, Chen and Liu, Yiming and Wang, Zichen and Zhang, Zuo and Yang, Tong and Ma, Hao and Gao, Zhen and Pei, Jian},
journal={arXiv preprint arXiv:2512.01274},
year={2025}
}