researchcodebench

Name: arXiv/researchcodebench
Author: arXiv

arXiv/researchcodebench

Implementing Novel Machine Learning Research Code

Description

ResearchCodeBench is a benchmark for evaluating LLMs' ability to translate novel machine-learning research from top 2024–2025 papers into correct, executable code. It consists of 212 coding challenges drawn from cutting-edge ML papers and provides a rigorous, community-driven platform to assess model performance, contamination, and error patterns.

arXiv

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
PatrickHua/research-code-bench	1	3 months ago