ml-dev-bench
Description
ML-Dev-Bench is a benchmark for testing agentic capabilities on applied Machine Learning development tasks. It assesses performance across dataset handling, model training, improving existing models, debugging, and API integration with popular ML tools through a diverse set of 30 tasks, evaluating agents such as ReAct, Openhands, and AIDE, and is open-sourced.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |