gravity-bench-v1

Description

Gravity-Bench-v1 is an environment-based benchmark for evaluating AI agents' scientific discovery abilities by challenging them to uncover gravitational physics concealed within rigorous dynamic simulations, including out-of-distribution cases that deviate from real-world physics. Agents must plan experiments under a limited data-collection budget and perform dynamic data analysis and reasoning to solve tasks efficiently, with reference solutions provided for calibration.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/GravityBench
1
1 months ago
arXiv/gravity-bench-v1 | OpenReward