chinesesimpleqa

Description

Chinese SimpleQA is the first comprehensive Chinese benchmark to evaluate the factuality of language models on short-question answering, covering six major topics and 99 diverse subtopics and designed with five properties: Chinese, Diverse, High-quality, Static, and Easy-to-evaluate. It provides high-quality, static short question–answer pairs that are easy to grade (e.g., via the OpenAI API) and enables systematic evaluation of LLM factuality to guide model developers.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/ChineseSimpleQA
0
1 months ago
arXiv/chinesesimpleqa | OpenReward