dsbench

Description

DSBench is a comprehensive benchmark for evaluating data science agents on realistic, end-to-end tasks. It comprises 466 data analysis and 74 data modeling tasks from Eloquence and Kaggle, featuring long contexts, multimodal task backgrounds, reasoning with large data files and multi-table structures to better reflect real-world data science challenges.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
LiqiangLiqiang/DSBench
4
2 weeks ago
TencentAILab/dsbench | OpenReward