replicationbench

Name: arXiv/replicationbench
Author: arXiv

arXiv/replicationbench

Replication of Astrophysics Research Papers

Description

ReplicationBench is an evaluation framework for testing whether AI agents can faithfully and correctly replicate entire astrophysics research papers as scientific research assistants. It splits each paper into author-co-developed tasks targeting core contributions—experimental setup, derivations, data analysis, and codebase—to enable objective assessment of both faithfulness to original methods and technical correctness.

arXiv

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
GeneralReasoning/replicationbench	0	3 months ago