API Endpoint

Leaderboard

Loading leaderboard...

README

HMMT

Description

HMMT is an environment for evaluating mathematical reasoning on problems from the Harvard-MIT Mathematics Tournament. Agents solve competition-level mathematics problems from HMMT February and November 2025 competitions. Answer verification uses the math-verify library for semantic equivalence checking.

Capabilities

Competition-level mathematical problem solving
Harvard-MIT Mathematics Tournament problem evaluation
Multi-step mathematical reasoning
Symbolic answer verification

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

Tasks

There are two splits in this environment:

feb_2025: HMMT February 2025 problems
nov_2025: HMMT November 2025 problems

Problems span various mathematical topics from the Harvard-MIT Mathematics Tournament.

Reward Structure

This is a sparse, verifiable reward environment. The agent calls answer to submit a solution:

1.0: Answer is mathematically equivalent to the reference solution
0.0: Answer is incorrect

Answer verification uses the math-verify library to check semantic equivalence.

Data

Data is sourced from MathArena/hmmt_feb_2025 and MathArena/hmmt_nov_2025 HuggingFace datasets.

Tools

Tool	Description
`answer`	Submit final answer for verification

Time Horizon

Single-turn. The agent receives a problem and submits one answer.

Environment Difficulty

Nov 2025 split:

Model	Accuracy
Qwen3.5-397B-A17B	100%
Step 3.5 Flash (parallel thinking)	98%
GLM-5	96.9%
Qwen3-Max-Thinking	94.7%
Step 3.5 Flash	94%
Qwen 3 Coder Next	75.57%

Other Environment Requirements

There are no further environment requirements; HMMT works out of the box with the OpenReward endpoint.

Safety

Agents in HMMT solve mathematical problems in a standard environment. The environment does not present direct safety risks.

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

HMMT

GeneralReasoning/HMMT

HMMT

Description

Capabilities

Compute Requirements

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Tools

Compute Configuration

Estimated Cost

Examples