longbenchv2

Name: arXiv/longbenchv2
Author: arXiv

arXiv/longbenchv2

Long-context Multitask Reasoning and Understanding

Description

LongBench v2 is a benchmark for assessing LLMs' ability to handle long-context problems requiring deep understanding and reasoning across real-world multitasks. It contains 503 multiple-choice questions with contexts ranging from 8k to 2M words spanning six task categories—single-document QA, multi-document QA, long in-context learning, long-dialogue history understanding, code-repository understanding, and long structured-data understanding—curated from nearly 100 experts and quality-controlled via automated and manual review.

arXiv

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
bys0318/LongBench-v2	5	3 months ago