ATLAS-Science
Description
ATLAS (AGI-Oriented Testbed for Logical Application in Science) is a large-scale, high-difficulty, cross-disciplinary benchmark designed to assess frontier LLM scientific reasoning and integration across domains. It comprises approximately 800 original, expert-created problems spanning seven fields (mathematics, physics, chemistry, biology, computer science, earth science, and materials science) that emphasize contamination-resistant, open-ended multi-step LaTeX answers, rigorous expert peer review and adversarial testing, and an LLM-judged evaluation paradigm.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |