API Endpoint

Leaderboard

Loading leaderboard...

Implementation of

arXiv/dacomp

README

DAComp-DA

Description

DAComp-DA (Data Agent Competition — Data Analysis) is an environment for evaluating AI agents on open-ended data analysis tasks. Agents analyze SQLite databases and produce markdown reports with visualizations, evaluated by LLM judges on rubric fidelity, readability, analytical depth, and visualization quality.

Capabilities

SQL querying and database exploration
Python-based data analysis (pandas, numpy, scipy, scikit-learn)
Data visualization (matplotlib, seaborn)
Report writing (Markdown)

Compute Requirements

Sandbox: 2 CPU / 4GB memory per session
LLM evaluation: OpenAI API access (gpt-5-mini) for rubric/GSB scoring

License

MIT License

Tasks

Split	Count	Description
test	100	Open-ended data analysis with SQLite databases

Reward Structure

LLM-judged, 0–100 scale:

Rubric scoring (60%): LLM evaluates the report against task-specific rubrics with dimensions: Completeness, Accuracy, Conclusiveness.
GSB readability (10%): Pairwise comparison against 5 reference reports on readability.
GSB professionalism (10%): Pairwise comparison on analytical depth and professionalism.
GSB visualization (20%): Pairwise comparison on visualization quality (requires images).

GSB raw scores are threshold-mapped (< -3 → -1, [-3,3] → 0, >3 → +1), averaged, and clamped to [0, 1].

Data

Source: HuggingFace (dacomp-da)
100 SQLite databases (~6GB total), 100 evaluation rubrics with 5 reference reports each

Tools

Tool	Description
`bash`	Execute bash commands in the sandbox (Python, SQL, file I/O)
`submit`	Submit a markdown report with optional image paths for grading

Time Horizon

Multi-turn. Tasks typically require 10–30 tool calls (exploration → analysis → visualization → report).

Environment Difficulty

Even state-of-the-art agents achieve average scores below 40/100.

Other Environment Requirements

OpenAI API key for LLM evaluation (rubric/GSB scoring)

Safety

Tasks involve analysis of synthetic/public business data. No sensitive personal data. Sandboxes are network-isolated.

Citations

@misc{lei2025dacomp,
      title={DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle},
      author={Fangyu Lei and Jinxiang Meng and Yiming Huang and Junjie Zhao and Yitong Zhang and Jianwen Luo and Xin Zou and Ruiyi Yang and Wenbo Shi and Yan Gao and Shizhu He and Zuo Wang and Qian Liu and Yang Wang and Ke Wang and Jun Zhao and Kang Liu},
      year={2025},
      eprint={2512.04324},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.04324},
}

Repository

Source repository

EnvCommons/DAComp-DA

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	2 vCPUs / 4 GB RAM

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	$0.0000460
Total	$0.0000780

Examples

5-minute session$0.0234

1-hour session$0.2808

DAComp-DA

GeneralReasoning/DAComp-DA

DAComp-DA

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citations

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples