AIME2026

API Endpoint
Leaderboard
Loading leaderboard...
README

AIME2026

OpenReward Environment

Description

AIME2026 is an environment for evaluating mathematical reasoning on 30 problems from the American Invitational Mathematics Examination 2026. AIME is a prestigious invitational competition for high school students who scored in the top 2.5% on the AMC 10/12. Problems cover algebra, geometry, number theory, combinatorics, and calculus, with integer answers validated via symbolic mathematical equivalence.

Capabilities

  • High school competition mathematics at AIME difficulty
  • Integer answer validation with symbolic equivalence checking
  • Coverage of algebra, geometry, number theory, combinatorics, and calculus

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

Tasks

There is one split in this environment:

  • test: 30 tasks

Each problem requires an integer answer (typical range: 0-999).

Reward Structure

Single-turn evaluation with deterministic grading. The agent submits an answer via the answer tool. The answer is verified using the math_verify library for symbolic mathematical equivalence. Reward is 1.0 if correct, 0.0 if incorrect.

Data

aime_2026_problems.parquet (30 problems). Stored on the OpenReward platform.

Tools

ToolDescription
answerSubmit an integer answer. Evaluated via symbolic equivalence checking. Ends the episode.

Time Horizon

Single-turn. The agent reads the problem and submits one answer.

Environment Difficulty

AIME 2026 represents standard AIME difficulty. MathArena evaluates frontier models:

ModelAccuracy
Step 3.5 Flash96.7%
Kimi K2.595.8%
GLM 595.8%
DeepSeek-V3.294.2%
Qwen3.5-397B-A17B93.3%

Top reasoning models now achieve near-perfect scores on AIME-level problems.

Other Environment Requirements

There are no further environment requirements; AIME2026 works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in AIME2026 solve competition mathematics problems in a standard environment. The environment does not present direct safety risks.

GeneralReasoning/AIME2026 | OpenReward