discoverybench
Description
DiscoveryBench is the first comprehensive benchmark that formalizes and evaluates the multi-step process of data-driven discovery using large language models. It comprises 264 tasks across six diverse domains derived from published workflows (each defined by a dataset, metadata, and a natural-language discovery goal) plus 903 synthetic tasks for controlled complexity evaluations, and provides a structured, facet-based evaluation to analyze failure modes.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 2 months ago |