physicseval
Description
${\rm P{\small HYSICS}E{\small VAL}} (PhysicsEval) is a benchmark for evaluating the performance of large language models on mathematical and descriptive physics problems, including assessments using inference-time techniques and multi-agent verification frameworks. It consists of 19,609 problems sourced from various physics textbooks with corresponding correct solutions scraped from physics forums and educational websites.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |