longbenchv2

Description

LongBench v2 is a benchmark for assessing LLMs' ability to handle long-context problems requiring deep understanding and reasoning across real-world multitasks. It contains 503 multiple-choice questions with contexts ranging from 8k to 2M words spanning six task categories—single-document QA, multi-document QA, long in-context learning, long-dialogue history understanding, code-repository understanding, and long structured-data understanding—curated from nearly 100 experts and quality-controlled via automated and manual review.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
bys0318bys0318/LongBench-v2
5
2 months ago
arXiv/longbenchv2 | OpenReward