researchcodebench

Description

ResearchCodeBench is a benchmark for evaluating LLMs' ability to translate novel machine-learning research from top 2024–2025 papers into correct, executable code. It consists of 212 coding challenges drawn from cutting-edge ML papers and provides a rigorous, community-driven platform to assess model performance, contamination, and error patterns.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
PatrickHuaPatrickHua/research-code-bench
1
2 months ago
arXiv/researchcodebench | OpenReward