API Endpoint

Leaderboard

Loading leaderboard...

Implementation of

arXiv/dacomp

README

DAComp-DE

Description

DAComp-DE (Data Agent Competition — Data Engineering) is an environment for evaluating AI agents on multi-stage data engineering tasks. Agents build, extend, or design dbt-style SQL pipelines.

DE-Impl (30 tasks): Build a complete SQL pipeline from scratch (staging → intermediate → marts).
DE-Evol (50 tasks): Modify or extend an existing pipeline to meet new requirements.
DE-Arch (30 tasks): Design a comprehensive data architecture blueprint in YAML.

Capabilities

SQL pipeline construction (DuckDB, dbt-style layers)
Repository exploration and modification
Data architecture design (YAML blueprints)
Python scripting and data tooling

Compute Requirements

Sandbox: 2 CPU / 4GB memory per session
LLM evaluation: OpenAI API access (gpt-5-mini) for DE-Arch scoring only

License

MIT License

Tasks

Sub-type	Split	Count	Description
DE-Impl	test	30	Build SQL pipeline from scratch
DE-Evol	test	50	Extend existing SQL pipeline
DE-Arch	test	30	Design architecture blueprint

Reward Structure

DE-Impl/Evol (Deterministic, 0–100 scale)

Row-hash multiset comparison of each table against gold DuckDB, with layer-weighted scoring:

Staging: 15%
Intermediate: 25%
Marts: 60%

DE-Arch (LLM-judged, 0–100 scale)

LLM evaluates YAML blueprint against rubric with evidence-based scoring.

Data

Source: HuggingFace (dacomp-de, dacomp-de-gold)
DE: 110 task repositories, 80 gold DuckDB databases, 30 architecture rubrics

Tools

Tool	Description
`bash`	Execute bash commands in the sandbox (Python, SQL, DuckDB, file I/O)
`submit`	Submit work for evaluation (YAML for DE-Arch, triggers pipeline run for DE-Impl/Evol)

Time Horizon

Multi-turn. DE-Impl: 20–50 tool calls, DE-Evol: 10–30, DE-Arch: 5–15.

Environment Difficulty

Even state-of-the-art agents achieve success rates under 20% on DE-Impl/Evol.

Other Environment Requirements

OpenAI API key for DE-Arch LLM evaluation
OpenReward API key for sandbox access

Safety

Tasks involve synthetic/public data engineering schemas. No sensitive personal data. Sandboxes are network-isolated.

Citations

@misc{lei2025dacomp,
      title={DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle},
      author={Fangyu Lei and Jinxiang Meng and Yiming Huang and Junjie Zhao and Yitong Zhang and Jianwen Luo and Xin Zou and Ruiyi Yang and Wenbo Shi and Yan Gao and Shizhu He and Zuo Wang and Qian Liu and Yang Wang and Ke Wang and Jun Zhao and Kang Liu},
      year={2025},
      eprint={2512.04324},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.04324},
}

Repository

Source repository

EnvCommons/DAComp-DE

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	2 vCPUs / 4 GB RAM

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	$0.0000460
Total	$0.0000780

Examples

5-minute session$0.0234

1-hour session$0.2808

DAComp-DE

GeneralReasoning/DAComp-DE

DAComp-DE

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

DE-Impl/Evol (Deterministic, 0–100 scale)

DE-Arch (LLM-judged, 0–100 scale)

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citations

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples