dacomp

Description

DAComp is a benchmark of 210 tasks that mirrors complex enterprise data intelligence workflows by combining repository-level data engineering tasks—designing and building multi-stage SQL pipelines and evolving existing systems on industrial schemas—with open-ended data analysis tasks that demand strategic planning, iterative exploratory coding, interpretation of intermediate results, and synthesis of actionable recommendations. Engineering tasks are evaluated with execution-based, multi-metric scoring while analysis tasks are judged by an experimentally validated LLM-judge guided by hierarchical rubrics, revealing low success rates that expose distinct deficiencies in pipeline orchestration and open-ended reasoning.

Leaderboard
Loading leaderboard...
Implementations (2)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/DAComp-DE
0
2 months ago
GeneralReasoningGeneralReasoning/DAComp-DA
0
2 months ago
arXiv/dacomp | OpenReward