ATLAS-Science

Description

ATLAS (AGI-Oriented Testbed for Logical Application in Science) is a large-scale, high-difficulty, cross-disciplinary benchmark designed to assess frontier LLM scientific reasoning and integration across domains. It comprises approximately 800 original, expert-created problems spanning seven fields (mathematics, physics, chemistry, biology, computer science, earth science, and materials science) that emphasize contamination-resistant, open-ended multi-step LaTeX answers, rigorous expert peer review and adversarial testing, and an LLM-judged evaluation paradigm.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/ATLAS
0
1 months ago
arXiv/ATLAS-Science | OpenReward