AIME2024

Description

AIME2024 is an environment for evaluating mathematical reasoning on problems from the 2024 American Invitational Mathematics Examination (AIME). The AIME is a prestigious high school mathematics competition administered by the Mathematical Association of America (MAA), serving as the second stage of the AMC pathway toward selection for the International Mathematical Olympiad (IMO). Problems require deep mathematical reasoning across algebra, combinatorics, geometry, and number theory, with answers that are integers in the range 000--999.

Capabilities

Solving competition-level mathematics problems
Multi-step mathematical reasoning
Algebraic manipulation and computation
Combinatorial and geometric reasoning

Compute Requirements

AIME2024 is a lightweight, single-turn environment. The agent receives a problem, reasons about it, and submits a single integer answer. No sandbox or significant compute resources are required beyond the agent's own inference.

Tasks

There are 30 tasks in a single test split, consisting of:

AIME I 2024: 15 problems (problems 0--14)
AIME II 2024: 15 problems (problems 15--29)

Each task presents the agent with a single AIME problem statement. The agent must solve the problem and submit an integer answer using the answer tool.

Reward Structure

AIME2024 uses a binary, deterministic reward:

1.0 if the submitted answer is correct
0.0 if the submitted answer is incorrect

Grading is performed using the math-verify library, which parses and verifies mathematical expressions. No LLM grader is used; the reward is fully deterministic.

Data

The 30 AIME 2024 problems are sourced from the Maxwell-Jia/AIME_2024 dataset on Hugging Face. Each record contains the problem statement and the ground-truth integer answer. The data is stored as a Parquet file (aime_2024_problems.parquet) and downloaded at build time from HuggingFace.

Tools

AIME2024 exposes a single tool:

Tool	Parameters	Description
`answer`	`answer: str`	Submits the agent's final answer. The answer is parsed and verified against the ground truth using math-verify. This call ends the episode.

Time Horizon

AIME2024 is a single-turn environment. The agent receives the problem in the prompt and is expected to make exactly one tool call (answer) to submit its solution. There is no multi-turn interaction.

Environment Difficulty

AIME problems are challenging competition mathematics requiring creative multi-step reasoning. Selected model scores on AIME 2024:

Model	Score
GPT-5 pro (python)	100
o1	96
o3 (no tools)	91.6
Qwen 3 Coder Next	89.01
o3-mini (high)	87.3
DeepSeek-R1	79.8
o3-mini (medium)	79.6
OpenAI-o1-0912	74.4
Magistral Medium	73.6
DeepSeek-R1-Zero	71

Other Environment Requirements

There are no further environment requirements; AIME2024 works out of the box with the OpenReward endpoint without any external API keys.

Safety

AIME2024 is a purely mathematical evaluation environment. The agent solves well-defined competition problems with known correct answers and does not interact with external systems, APIs, or other agents. The environment presents no direct or indirect safety risks.

Citation

@dataset{GRAIME2024,
  author    = {General Reasoning Inc. Team},
  title     = {AIME2024},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/AIME2024}
}

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152