GPQA

Name: arXiv/GPQA
Author: arXiv

arXiv/GPQA

Graduate-Level Scientific Question Answering

Description

GPQA is a benchmark dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry, designed to be "Google-proof" and extremely difficult for both human experts (≈65% accuracy) and state-of-the-art AI (GPT-4 baseline ≈39%). It is intended to enable realistic scalable oversight experiments for studying how humans can reliably supervise AI systems when answering very hard scientific questions that may surpass human capabilities.

arXiv

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
GeneralReasoning/GPQA	0	3 months ago