gravity-bench-v1
Description
Gravity-Bench-v1 is an environment-based benchmark for evaluating AI agents' scientific discovery abilities by challenging them to uncover gravitational physics concealed within rigorous dynamic simulations, including out-of-distribution cases that deviate from real-world physics. Agents must plan experiments under a limited data-collection budget and perform dynamic data analysis and reasoning to solve tasks efficiently, with reference solutions provided for calibration.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
1 | 1 months ago |