HMMT
HMMT
Description
HMMT is an environment for evaluating mathematical reasoning on problems from the Harvard-MIT Mathematics Tournament. Agents solve competition-level mathematics problems from HMMT February and November 2025 competitions. Answer verification uses the math-verify library for semantic equivalence checking.
Capabilities
- Competition-level mathematical problem solving
- Harvard-MIT Mathematics Tournament problem evaluation
- Multi-step mathematical reasoning
- Symbolic answer verification
Compute Requirements
Agents are given a standard environment with no sandbox or file system access.
Tasks
There are two splits in this environment:
- feb_2025: HMMT February 2025 problems
- nov_2025: HMMT November 2025 problems
Problems span various mathematical topics from the Harvard-MIT Mathematics Tournament.
Reward Structure
This is a sparse, verifiable reward environment. The agent calls answer to submit a solution:
- 1.0: Answer is mathematically equivalent to the reference solution
- 0.0: Answer is incorrect
Answer verification uses the math-verify library to check semantic equivalence.
Data
Data is sourced from MathArena/hmmt_feb_2025 and MathArena/hmmt_nov_2025 HuggingFace datasets.
Tools
| Tool | Description |
|---|---|
answer | Submit final answer for verification |
Time Horizon
Single-turn. The agent receives a problem and submits one answer.
Environment Difficulty
Nov 2025 split:
| Model | Accuracy |
|---|---|
| Qwen3.5-397B-A17B | 100% |
| Step 3.5 Flash (parallel thinking) | 98% |
| GLM-5 | 96.9% |
| Qwen3-Max-Thinking | 94.7% |
| Step 3.5 Flash | 94% |
| Qwen 3 Coder Next | 75.57% |
Other Environment Requirements
There are no further environment requirements; HMMT works out of the box with the OpenReward endpoint.
Safety
Agents in HMMT solve mathematical problems in a standard environment. The environment does not present direct safety risks.