AIME2024

API Endpoint
Leaderboard
Loading leaderboard...
README

AIME2024

OpenReward Environment Hugging Face Dataset

Description

AIME2024 is an environment for evaluating mathematical reasoning on problems from the 2024 American Invitational Mathematics Examination (AIME). The AIME is a prestigious high school mathematics competition administered by the Mathematical Association of America (MAA), serving as the second stage of the AMC pathway toward selection for the International Mathematical Olympiad (IMO). Problems require deep mathematical reasoning across algebra, combinatorics, geometry, and number theory, with answers that are integers in the range 000--999.

Capabilities

  • Solving competition-level mathematics problems
  • Multi-step mathematical reasoning
  • Algebraic manipulation and computation
  • Combinatorial and geometric reasoning

Compute Requirements

AIME2024 is a lightweight, single-turn environment. The agent receives a problem, reasons about it, and submits a single integer answer. No sandbox or significant compute resources are required beyond the agent's own inference.

Tasks

There are 30 tasks in a single test split, consisting of:

  • AIME I 2024: 15 problems (problems 0--14)
  • AIME II 2024: 15 problems (problems 15--29)

Each task presents the agent with a single AIME problem statement. The agent must solve the problem and submit an integer answer using the answer tool.

Reward Structure

AIME2024 uses a binary, deterministic reward:

  • 1.0 if the submitted answer is correct
  • 0.0 if the submitted answer is incorrect

Grading is performed using the math-verify library, which parses and verifies mathematical expressions. No LLM grader is used; the reward is fully deterministic.

Data

The 30 AIME 2024 problems are sourced from the Maxwell-Jia/AIME_2024 dataset on Hugging Face. Each record contains the problem statement and the ground-truth integer answer. The data is stored as a Parquet file (aime_2024_problems.parquet) and downloaded at build time from HuggingFace.

Tools

AIME2024 exposes a single tool:

ToolParametersDescription
answeranswer: strSubmits the agent's final answer. The answer is parsed and verified against the ground truth using math-verify. This call ends the episode.

Time Horizon

AIME2024 is a single-turn environment. The agent receives the problem in the prompt and is expected to make exactly one tool call (answer) to submit its solution. There is no multi-turn interaction.

Environment Difficulty

AIME problems are challenging competition mathematics requiring creative multi-step reasoning. Selected model scores on AIME 2024:

ModelScore
GPT-5 pro (python)100
o196
o3 (no tools)91.6
Qwen 3 Coder Next89.01
o3-mini (high)87.3
DeepSeek-R179.8
o3-mini (medium)79.6
OpenAI-o1-091274.4
Magistral Medium73.6
DeepSeek-R1-Zero71

Other Environment Requirements

There are no further environment requirements; AIME2024 works out of the box with the OpenReward endpoint without any external API keys.

Safety

AIME2024 is a purely mathematical evaluation environment. The agent solves well-defined competition problems with known correct answers and does not interact with external systems, APIs, or other agents. The environment presents no direct or indirect safety risks.

Citation

@dataset{GRAIME2024,
  author    = {General Reasoning Inc. Team},
  title     = {AIME2024},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/AIME2024}
}
GeneralReasoning/AIME2024 | OpenReward