browsecomp-zh

Description

BrowseComp-ZH is a high-difficulty benchmark purpose-built to comprehensively evaluate LLM agents on the Chinese web, measuring real-time browsing, retrieval, and multi-hop reasoning abilities. It consists of 289 multi-hop questions spanning 11 diverse domains, each reverse-engineered from a short verifiable answer and vetted via a two-stage quality control protocol to ensure high difficulty and answer uniqueness.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/BrowseComp-ZH
0
1 months ago
arXiv/browsecomp-zh | OpenReward