chinesesimpleqa

Name: arXiv/chinesesimpleqa
Author: arXiv

arXiv/chinesesimpleqa

Factuality Evaluation in Chinese Question Answering

Description

Chinese SimpleQA is the first comprehensive Chinese benchmark to evaluate the factuality of language models on short-question answering, covering six major topics and 99 diverse subtopics and designed with five properties: Chinese, Diverse, High-quality, Static, and Easy-to-evaluate. It provides high-quality, static short question–answer pairs that are easy to grade (e.g., via the OpenAI API) and enables systematic evaluation of LLM factuality to guide model developers.

arXiv GitHub HuggingFace

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
GeneralReasoning/ChineseSimpleQA	0	3 months ago