polymath

Description

PolyMath is a multilingual mathematical reasoning benchmark covering 18 languages and four easy-to-hard difficulty levels. It ensures comprehensive difficulty coverage, language diversity, and high-quality translations to provide a highly discriminative testbed for evaluating the reasoning capabilities and language-consistency of large language models.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/polymath
0
1 months ago
arXiv/polymath | OpenReward