replicationbench
Description
ReplicationBench is an evaluation framework for testing whether AI agents can faithfully and correctly replicate entire astrophysics research papers as scientific research assistants. It splits each paper into author-co-developed tasks targeting core contributions—experimental setup, derivations, data analysis, and codebase—to enable objective assessment of both faithfulness to original methods and technical correctness.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |