physicseval

Description

${\rm P{\small HYSICS}E{\small VAL}} (PhysicsEval) is a benchmark for evaluating the performance of large language models on mathematical and descriptive physics problems, including assessments using inference-time techniques and multi-agent verification frameworks. It consists of 19,609 problems sourced from various physics textbooks with corresponding correct solutions scraped from physics forums and educational websites.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/PhysicsEval
0
1 months ago
arXiv/physicseval | OpenReward