chinesesimpleqa
Description
Chinese SimpleQA is the first comprehensive Chinese benchmark to evaluate the factuality of language models on short-question answering, covering six major topics and 99 diverse subtopics and designed with five properties: Chinese, Diverse, High-quality, Static, and Easy-to-evaluate. It provides high-quality, static short question–answer pairs that are easy to grade (e.g., via the OpenAI API) and enables systematic evaluation of LLM factuality to guide model developers.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |