longbenchv2
Description
LongBench v2 is a benchmark for assessing LLMs' ability to handle long-context problems requiring deep understanding and reasoning across real-world multitasks. It contains 503 multiple-choice questions with contexts ranging from 8k to 2M words spanning six task categories—single-document QA, multi-document QA, long in-context learning, long-dialogue history understanding, code-repository understanding, and long structured-data understanding—curated from nearly 100 experts and quality-controlled via automated and manual review.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
5 | 2 months ago |