API Endpoint

Leaderboard

Loading leaderboard...

Implementation of

arXiv/MMMU

README

MMMU

Description

MMMU (Massive Multi-discipline Multimodal Understanding) is an environment for evaluating college-level multimodal reasoning across 6 disciplines, 30 subjects, and 183 subfields. Each question includes up to 7 heterogeneous images (charts, diagrams, tables, chemical structures, music notation, etc.) and requires understanding complex visual and textual information.

Capabilities

College-level multimodal question answering
Up to 7 images per question with 30+ image types
Multiple-choice evaluation across expert-level reasoning tasks

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

Apache 2.0.

Tasks

Three splits in this environment:

dev: 150 tasks
validation: 900 tasks
test: 10,500 tasks

Total: 11,550 college-level questions spanning Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering.

Reward Structure

Single-turn evaluation with deterministic grading. The agent submits a single letter answer via the submit_answer tool. The submitted answer is compared via exact match against the ground truth. Reward is 1.0 if correct, 0.0 if incorrect.

Data

Parquet files (~605 MB total) for dev, validation, and test splits sourced from HuggingFace MMMU/MMMU. Stored on the OpenReward platform.

Tools

Tool	Description
`submit_answer`	Submit a single letter answer. Deterministic evaluation via exact match. Ends the episode.

Time Horizon

Single-turn. The agent reads the multimodal question (text and images) and submits one answer.

Environment Difficulty

MMMU evaluates college-level multimodal understanding:

Model	Accuracy
Gemini 3 Flash	87.6%
Gemini 3 Pro	87.5%
GPT-5.2	86.7%
Claude 4.5 Sonnet	77.8%
Human Expert	88.6%

Models have now surpassed the performance of average human experts (76.2%) but still trail top human experts (88.6%).

Other Environment Requirements

There are no further environment requirements; MMMU works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in MMMU answer college-level multimodal questions in a standard environment. The environment does not present direct safety risks.

Citation

@article{yue2023mmmu,
  title={MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI},
  author={Yue, Xiang and Ni, Yuansheng and Zhang, Kai and Zheng, Tianyu and Liu, Ruoqi and Zhang, Ge and Stevens, Samuel and Jiang, Dongfu and Ren, Weiming and Sun, Yuxuan and others},
  journal={arXiv preprint arXiv:2311.16502},
  year={2023}
}

Repository

Source repository

EnvCommons/MMMU

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

MMMU

GeneralReasoning/MMMU

MMMU

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples