putnamgap

Name: arXiv/putnamgap
Author: arXiv

arXiv/putnamgap

Description

PutnamGAP is a benchmark for assessing LLMs' mathematical-reasoning robustness by stress-testing them on competition-level math problems that are mathematically equivalent but vary linguistically and parametrically. It comprises multiple mathematically-equivalent variants (e.g., surface-renaming and parametric changes) of original problems to measure sensitivity to non-mathematical perturbations and evaluate model robustness.

arXiv

Leaderboard

Loading leaderboard...

Implementations

No implementations linked yet. Add one to showcase related work.