HMMT

API Endpoint
Leaderboard
Loading leaderboard...
README

HMMT

OpenReward Environment Hugging Face Dataset

Description

HMMT is an environment for evaluating mathematical reasoning on problems from the Harvard-MIT Mathematics Tournament. Agents solve competition-level mathematics problems from HMMT February and November 2025 competitions. Answer verification uses the math-verify library for semantic equivalence checking.

Capabilities

  • Competition-level mathematical problem solving
  • Harvard-MIT Mathematics Tournament problem evaluation
  • Multi-step mathematical reasoning
  • Symbolic answer verification

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

Tasks

There are two splits in this environment:

  • feb_2025: HMMT February 2025 problems
  • nov_2025: HMMT November 2025 problems

Problems span various mathematical topics from the Harvard-MIT Mathematics Tournament.

Reward Structure

This is a sparse, verifiable reward environment. The agent calls answer to submit a solution:

  • 1.0: Answer is mathematically equivalent to the reference solution
  • 0.0: Answer is incorrect

Answer verification uses the math-verify library to check semantic equivalence.

Data

Data is sourced from MathArena/hmmt_feb_2025 and MathArena/hmmt_nov_2025 HuggingFace datasets.

Tools

ToolDescription
answerSubmit final answer for verification

Time Horizon

Single-turn. The agent receives a problem and submits one answer.

Environment Difficulty

Nov 2025 split:

ModelAccuracy
Qwen3.5-397B-A17B100%
Step 3.5 Flash (parallel thinking)98%
GLM-596.9%
Qwen3-Max-Thinking94.7%
Step 3.5 Flash94%
Qwen 3 Coder Next75.57%

Other Environment Requirements

There are no further environment requirements; HMMT works out of the box with the OpenReward endpoint.

Safety

Agents in HMMT solve mathematical problems in a standard environment. The environment does not present direct safety risks.

GeneralReasoning/HMMT | OpenReward