hle-verified

Description

HLE-Verified is a verified and revised version of the Humanity's Last Exam (HLE) benchmark for evaluating frontier large language models on challenging, multi-domain questions, built around a transparent verification protocol and a fine-grained error taxonomy. It was constructed via a two-stage validation-and-repair workflow yielding 641 verified items, 1,170 revised-and-certified items, and 689 documented uncertain items released with explicit uncertainty sources and expertise tags for future refinement.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/HLE-Verified
2
1 months ago
arXiv/hle-verified | OpenReward