astabench
Description
AstaBench is a benchmark suite for evaluating AI agents' ability to perform scientific research, comprising 2400+ problems spanning the entire scientific discovery process across multiple domains and including tasks inspired by real user requests to deployed Asta agents. It includes the first production-grade scientific research environment with reproducible search tools, standardized interfaces, and a comprehensive set of nine science-optimized agent classes and baselines to enable controlled, confounder-aware comparisons.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 2 months ago |