simpleqaverified

Name: Google/simpleqaverified
Author: Google

Google/simpleqaverified

Factuality Evaluation of Language Models

Description

SimpleQA Verified is a 1,000-prompt benchmark for evaluating LLM short-form factuality, built from OpenAI's SimpleQA. It was created via a rigorous multi-stage filtering process (de-duplication, topic balancing, and source reconciliation) to remove noisy and incorrect labels, reduce topical bias and redundancy, and provide a more reliable, challenging evaluation set with an improved autorater prompt.

arXiv

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
GeneralReasoning/SimpleQAVerified	0	3 months ago