supergpqa
Description
SuperGPQA is a benchmark for evaluating graduate-level knowledge and reasoning across 285 specialized disciplines using a novel Human-LLM collaborative filtering mechanism that iteratively refines and removes trivial or ambiguous questions based on LLM responses and expert feedback. It measures LLM performance across diverse domains—including light industry, agriculture, and service-oriented fields—and exposes substantial gaps in current state-of-the-art models.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |