API Endpoint

Leaderboard

Loading leaderboard...

README

SMT2025

Description

SMT2025 is an environment for evaluating mathematical reasoning on problems from the Stanford Math Tournament (SMT) 2025. Agents solve competition-level mathematics problems and submit answers in LaTeX boxed format. The environment uses a specialized grader for answer parsing and verification.

Capabilities

Competition-level mathematical problem solving
LaTeX answer parsing and verification
Stanford Math Tournament problem evaluation
Multi-step mathematical reasoning

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

Tasks

There is one split in this environment:

test: SMT 2025 competition problems

Problems cover various mathematical topics from the Stanford Math Tournament.

Reward Structure

This is a sparse, verifiable reward environment. The agent calls answer to submit a solution:

1.0: Answer matches the gold answer after parsing
0.0: Answer is incorrect

The grader parses both model and gold answers to handle LaTeX formatting variations.

Data

Data is sourced from the MathArena/smt_2025 HuggingFace dataset.

Tools

Tool	Description
`answer`	Submit final answer (use \boxed{} format)

Time Horizon

Single-turn. The agent receives a problem and submits one answer.

Environment Difficulty

[Put environment difficulty statistics here]

Other Environment Requirements

No other secrets required other than OpenReward API key.

Safety

Agents in SMT2025 solve mathematical problems in a standard environment. The environment does not present direct safety risks.

Citation

@inproceedings{balunovic2025matharena,
  title={MathArena: Evaluating LLMs on Uncontaminated Math Competitions},
  author={Balunovi{\'c}, Mislav and Dekoninck, Jasper and Petrov, Ivo and Jovanovi{\'c}, Nikola and Vechev, Martin},
  booktitle={Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS)},
  year={2025}
}

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

SMT2025

GeneralReasoning/SMT2025

SMT2025

Description

Capabilities

Compute Requirements

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

Tools

Compute Configuration

Estimated Cost

Examples