dsbench
Description
DSBench is a comprehensive benchmark for evaluating data science agents on realistic, end-to-end tasks. It comprises 466 data analysis and 74 data modeling tasks from Eloquence and Kaggle, featuring long contexts, multimodal task backgrounds, reasoning with large data files and multi-table structures to better reflect real-world data science challenges.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
4 | 2 weeks ago |