dacomp
Description
DAComp is a benchmark of 210 tasks that mirrors complex enterprise data intelligence workflows by combining repository-level data engineering tasks—designing and building multi-stage SQL pipelines and evolving existing systems on industrial schemas—with open-ended data analysis tasks that demand strategic planning, iterative exploratory coding, interpretation of intermediate results, and synthesis of actionable recommendations. Engineering tasks are evaluated with execution-based, multi-metric scoring while analysis tasks are judged by an experimentally validated LLM-judge guided by hierarchical rubrics, revealing low success rates that expose distinct deficiencies in pipeline orchestration and open-ended reasoning.
Leaderboard
Loading leaderboard...
Implementations (2)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 2 months ago | ||
0 | 2 months ago |