API Endpoint

Leaderboard

Loading leaderboard...

README

AIME2026

Description

AIME2026 is an environment for evaluating mathematical reasoning on 30 problems from the American Invitational Mathematics Examination 2026. AIME is a prestigious invitational competition for high school students who scored in the top 2.5% on the AMC 10/12. Problems cover algebra, geometry, number theory, combinatorics, and calculus, with integer answers validated via symbolic mathematical equivalence.

Capabilities

High school competition mathematics at AIME difficulty
Integer answer validation with symbolic equivalence checking
Coverage of algebra, geometry, number theory, combinatorics, and calculus

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

Tasks

There is one split in this environment:

test: 30 tasks

Each problem requires an integer answer (typical range: 0-999).

Reward Structure

Single-turn evaluation with deterministic grading. The agent submits an answer via the answer tool. The answer is verified using the math_verify library for symbolic mathematical equivalence. Reward is 1.0 if correct, 0.0 if incorrect.

Data

aime_2026_problems.parquet (30 problems). Stored on the OpenReward platform.

Tools

Tool	Description
`answer`	Submit an integer answer. Evaluated via symbolic equivalence checking. Ends the episode.

Time Horizon

Single-turn. The agent reads the problem and submits one answer.

Environment Difficulty

AIME 2026 represents standard AIME difficulty. MathArena evaluates frontier models:

Model	Accuracy
Step 3.5 Flash	96.7%
Kimi K2.5	95.8%
GLM 5	95.8%
DeepSeek-V3.2	94.2%
Qwen3.5-397B-A17B	93.3%

Top reasoning models now achieve near-perfect scores on AIME-level problems.

Other Environment Requirements

There are no further environment requirements; AIME2026 works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in AIME2026 solve competition mathematics problems in a standard environment. The environment does not present direct safety risks.

Repository

Source repository

EnvCommons/AIME2026

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

AIME2026

GeneralReasoning/AIME2026

AIME2026

Description

Capabilities

Compute Requirements

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples