simpleqaverified

Description

SimpleQA Verified is a 1,000-prompt benchmark for evaluating LLM short-form factuality, built from OpenAI's SimpleQA. It was created via a rigorous multi-stage filtering process (de-duplication, topic balancing, and source reconciliation) to remove noisy and incorrect labels, reduce topical bias and redundancy, and provide a more reliable, challenging evaluation set with an improved autorater prompt.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/SimpleQAVerified
0
1 months ago
Google/simpleqaverified | OpenReward