ml-dev-bench

Name: arXiv/ml-dev-bench
Author: arXiv

arXiv/ml-dev-bench

Applied Machine Learning Development Tasks

Description

ML-Dev-Bench is a benchmark for testing agentic capabilities on applied Machine Learning development tasks. It assesses performance across dataset handling, model training, improving existing models, debugging, and API integration with popular ML tools through a diverse set of 30 tasks, evaluating agents such as ReAct, Openhands, and AIDE, and is open-sourced.

arXiv

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
GeneralReasoning/ml-dev-bench	0	3 months ago

ml-dev-bench

arXiv/ml-dev-bench

Description

Repository

Clone Repository