API Endpoint

Leaderboard

Loading leaderboard...

README

DAPO-Math

Description

DAPO-Math is an environment for evaluating mathematical reasoning on competition-level problems from the DAPO-Math-17k dataset. The dataset was curated by ByteDance Seed and Tsinghua AIR as the training set for DAPO (Decoupled Clip and Dynamic sAmpling Policy Optimization), an open-source reinforcement learning system for large language models. Problems span algebra, geometry, number theory, and combinatorics, with integer ground-truth answers verified via rule-based matching.

Capabilities

Solving competition-level mathematics problems
Step-by-step mathematical reasoning
Producing precise numerical answers

Compute Requirements

Minimal. No sandbox or code execution is used. The environment runs rule-based answer verification only.

License

Apache 2.0, matching the original dataset license.

Tasks

There are approximately 14,100 tasks in a single train split (English subset of the deduplicated DAPO-Math-17k dataset). Each task is a competition-level math problem with an integer answer.

Reward Structure

Binary reward based on rule-based answer verification using the math_verify library (style: rule-lighteval/MATH_v2):

1.0 if the submitted answer is mathematically equivalent to the ground truth
0.0 otherwise

No LLM grader is used.

Data

Problems are sourced from the DAPO-Math-17k dataset, using the deduplicated English subset provided by open-r1/DAPO-Math-17k-Processed. The dataset is loaded from HuggingFace at server startup.

Tools

Tool	Description
`answer`	Submit a final answer for evaluation against the ground truth.

Time Horizon

Single-turn. The agent receives a math problem and submits one answer.

Environment Difficulty

These are competition-level math problems (olympiad-style). The DAPO system trained on this dataset achieved 50 points on AIME 2024 using Qwen2.5-32B.

Other Environment Requirements

No external API keys or secrets are required. Grading is entirely rule-based.

Safety

This environment poses minimal safety risk. Agents solve self-contained math problems with no access to external systems, file systems, or network resources.

Citations

@article{yu2025dapo,
  title={DAPO: An Open-Source LLM Reinforcement Learning System at Scale},
  author={Yu, Qiying and Zhang, Zheng and Zhu, Ruofei and Yuan, Yufeng and Zuo, Xiaochen and Yue, Yu and Fan, Tiantian and Liu, Gaohong and Liu, Lingjun and Liu, Xin and Lin, Haibin and Lin, Zhiqi and Ma, Bole and Sheng, Guangming and Tong, Yuxuan and Zhang, Chi and Zhang, Mofan and Zhang, Wang and Zhu, Hang and Zhu, Jinhua and Chen, Jiaze and Chen, Jiangjie and Wang, Chengyi and Yu, Hongli and Dai, Weinan and Song, Yuxuan and Wei, Xiangpeng and Zhou, Hao and Liu, Jingjing and Ma, Wei-Ying and Zhang, Ya-Qin and Yan, Lin and Qiao, Mu and Wu, Yonghui and Wang, Mingxuan},
  journal={arXiv preprint arXiv:2503.14476},
  year={2025}
}

Repository

Source repository

EnvCommons/DAPO-Math

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

DAPO-Math

GeneralReasoning/DAPO-Math

DAPO-Math

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citations

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples