simpleqaverified
Description
SimpleQA Verified is a 1,000-prompt benchmark for evaluating LLM short-form factuality, built from OpenAI's SimpleQA. It was created via a rigorous multi-stage filtering process (de-duplication, topic balancing, and source reconciliation) to remove noisy and incorrect labels, reduce topical bias and redundancy, and provide a more reliable, challenging evaluation set with an improved autorater prompt.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |