usaco

Description

USACO benchmark is a benchmark for evaluating language models on competitive programming using 307 problems from the USA Computing Olympiad, accompanied by high-quality unit tests, reference code, and official analyses. It enables systematic testing of LM inference methods and baselines (revealing low pass@1 for current models) and supports human-in-the-loop studies that show small targeted hints can dramatically improve model performance.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/USACO
0
2 months ago
arXiv/usaco | OpenReward