DAComp-DA

API Endpoint
Leaderboard
Loading leaderboard...
Implementation of
README

DAComp-DA

OpenReward Environment

Description

DAComp-DA (Data Agent Competition — Data Analysis) is an environment for evaluating AI agents on open-ended data analysis tasks. Agents analyze SQLite databases and produce markdown reports with visualizations, evaluated by LLM judges on rubric fidelity, readability, analytical depth, and visualization quality.

Capabilities

  • SQL querying and database exploration
  • Python-based data analysis (pandas, numpy, scipy, scikit-learn)
  • Data visualization (matplotlib, seaborn)
  • Report writing (Markdown)

Compute Requirements

  • Sandbox: 2 CPU / 4GB memory per session
  • LLM evaluation: OpenAI API access (gpt-5-mini) for rubric/GSB scoring

License

MIT License

Tasks

SplitCountDescription
test100Open-ended data analysis with SQLite databases

Reward Structure

LLM-judged, 0–100 scale:

  • Rubric scoring (60%): LLM evaluates the report against task-specific rubrics with dimensions: Completeness, Accuracy, Conclusiveness.
  • GSB readability (10%): Pairwise comparison against 5 reference reports on readability.
  • GSB professionalism (10%): Pairwise comparison on analytical depth and professionalism.
  • GSB visualization (20%): Pairwise comparison on visualization quality (requires images).

GSB raw scores are threshold-mapped (< -3 → -1, [-3,3] → 0, >3 → +1), averaged, and clamped to [0, 1].

Data

  • Source: HuggingFace (dacomp-da)
  • 100 SQLite databases (~6GB total), 100 evaluation rubrics with 5 reference reports each

Tools

ToolDescription
bashExecute bash commands in the sandbox (Python, SQL, file I/O)
submitSubmit a markdown report with optional image paths for grading

Time Horizon

Multi-turn. Tasks typically require 10–30 tool calls (exploration → analysis → visualization → report).

Environment Difficulty

Even state-of-the-art agents achieve average scores below 40/100.

Other Environment Requirements

  • OpenAI API key for LLM evaluation (rubric/GSB scoring)

Safety

Tasks involve analysis of synthetic/public business data. No sensitive personal data. Sandboxes are network-isolated.

Citations

@misc{lei2025dacomp,
      title={DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle},
      author={Fangyu Lei and Jinxiang Meng and Yiming Huang and Junjie Zhao and Yitong Zhang and Jianwen Luo and Xin Zou and Ruiyi Yang and Wenbo Shi and Yan Gao and Shizhu He and Zuo Wang and Qian Liu and Yang Wang and Ke Wang and Jun Zhao and Kang Liu},
      year={2025},
      eprint={2512.04324},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.04324},
}
GeneralReasoning/DAComp-DA | OpenReward