replicationbench

Description

ReplicationBench is an evaluation framework for testing whether AI agents can faithfully and correctly replicate entire astrophysics research papers as scientific research assistants. It splits each paper into author-co-developed tasks targeting core contributions—experimental setup, derivations, data analysis, and codebase—to enable objective assessment of both faithfulness to original methods and technical correctness.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/replicationbench
0
1 months ago
arXiv/replicationbench | OpenReward