astabench

Name: arXiv/astabench
Author: arXiv

arXiv/astabench

Description

AstaBench is a benchmark suite for evaluating AI agents' ability to perform scientific research, comprising 2400+ problems spanning the entire scientific discovery process across multiple domains and including tasks inspired by real user requests to deployed Asta agents. It includes the first production-grade scientific research environment with reproducible search tools, standardized interfaces, and a comprehensive set of nine science-optimized agent classes and baselines to enable controlled, confounder-aware comparisons.

arXiv GitHub HuggingFace

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
GeneralReasoning/E2EBench	0	3 months ago