AMO-Bench

Description

AMO-Bench (Advanced Mathematical reasoning benchmark) is a benchmark for evaluating large language models' mathematical reasoning at International Mathematical Olympiad (IMO) level and above using 50 human-crafted, entirely original problems. Each problem requires only a final answer to enable automatic, robust grading and to prevent memorization, and evaluations across 26 LLMs show substantial room for improvement (best model 52.4% accuracy) while revealing promising scaling with increased test‑time compute.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/AMO-Bench
0
1 months ago
LongCat/AMO-Bench | OpenReward