kramabench

Description

KRAMABENCH is a benchmark for evaluating AI systems' end-to-end ability to design and execute real-world data science pipelines, requiring data discovery, wrangling and cleaning, efficient processing, statistical reasoning, and orchestration of processing steps from a high-level task. It consists of 104 manually-curated real-world data science pipelines spanning 1,700 data files from 24 data sources across 6 domains.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/kramabench
0
1 months ago
arXiv/kramabench | OpenReward