researchcodebench
Description
ResearchCodeBench is a benchmark for evaluating LLMs' ability to translate novel machine-learning research from top 2024–2025 papers into correct, executable code. It consists of 212 coding challenges drawn from cutting-edge ML papers and provides a rigorous, community-driven platform to assess model performance, contamination, and error patterns.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
1 | 2 months ago |