astabench

Description

AstaBench is a benchmark suite for evaluating AI agents' ability to perform scientific research, comprising 2400+ problems spanning the entire scientific discovery process across multiple domains and including tasks inspired by real user requests to deployed Asta agents. It includes the first production-grade scientific research environment with reproducible search tools, standardized interfaces, and a comprehensive set of nine science-optimized agent classes and baselines to enable controlled, confounder-aware comparisons.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/E2EBench
0
2 months ago
arXiv/astabench | OpenReward