SMT2025
SMT2025
Description
SMT2025 is an environment for evaluating mathematical reasoning on problems from the Stanford Math Tournament (SMT) 2025. Agents solve competition-level mathematics problems and submit answers in LaTeX boxed format. The environment uses a specialized grader for answer parsing and verification.
Capabilities
- Competition-level mathematical problem solving
- LaTeX answer parsing and verification
- Stanford Math Tournament problem evaluation
- Multi-step mathematical reasoning
Compute Requirements
Agents are given a standard environment with no sandbox or file system access.
Tasks
There is one split in this environment:
- test: SMT 2025 competition problems
Problems cover various mathematical topics from the Stanford Math Tournament.
Reward Structure
This is a sparse, verifiable reward environment. The agent calls answer to submit a solution:
- 1.0: Answer matches the gold answer after parsing
- 0.0: Answer is incorrect
The grader parses both model and gold answers to handle LaTeX formatting variations.
Data
Data is sourced from the MathArena/smt_2025 HuggingFace dataset.
Tools
| Tool | Description |
|---|---|
answer | Submit final answer (use \boxed{} format) |
Time Horizon
Single-turn. The agent receives a problem and submits one answer.
Environment Difficulty
[Put environment difficulty statistics here]
Other Environment Requirements
No other secrets required other than OpenReward API key.
Safety
Agents in SMT2025 solve mathematical problems in a standard environment. The environment does not present direct safety risks.
Citation
@inproceedings{balunovic2025matharena,
title={MathArena: Evaluating LLMs on Uncontaminated Math Competitions},
author={Balunovi{\'c}, Mislav and Dekoninck, Jasper and Petrov, Ivo and Jovanovi{\'c}, Nikola and Vechev, Martin},
booktitle={Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS)},
year={2025}
}