FrontierMath
Description
FrontierMath is a benchmark of hundreds of original, expert-vetted, exceptionally challenging mathematics problems spanning major branches of modern mathematics, using new unpublished problems and automated verification to minimize data contamination. Solving typical problems requires hours to days of expert effort, current state-of-the-art models solve under 2% of problems, and FrontierMath provides a rigorous testbed to quantify AI progress toward expert-level mathematical abilities.
Leaderboard
Loading leaderboard...
Implementations
No implementations linked yet. Add one to showcase related work.