supergpqa

Description

SuperGPQA is a benchmark for evaluating graduate-level knowledge and reasoning across 285 specialized disciplines using a novel Human-LLM collaborative filtering mechanism that iteratively refines and removes trivial or ambiguous questions based on LLM responses and expert feedback. It measures LLM performance across diverse domains—including light industry, agriculture, and service-oriented fields—and exposes substantial gaps in current state-of-the-art models.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/SuperGPQA
0
1 months ago
arXiv/supergpqa | OpenReward