imo-bench

Description

IMO-Bench (International Mathematical Olympiad Benchmark) is a suite of advanced mathematical reasoning benchmarks vetted by top specialists that targets IMO-level problems to push foundation models beyond easy or short-answer evaluations. It comprises IMO-AnswerBench (400 diverse verifiable short-answer Olympiad problems), IMO-Proof Bench (basic and advanced proof-writing problems with detailed grading guidelines for automatic grading), and IMO-GradingBench (1,000 human gradings and autograder validation) to enable robust testing and automatic evaluation of long-form mathematical reasoning.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/IMO-Bench
0
1 months ago
Google/imo-bench | OpenReward