Encyclo-K
Encyclo-K
Description
Encyclo-K is an environment for evaluating multi-statement knowledge comprehension using dynamically composed questions. Each question aggregates 8-10 knowledge statements from authoritative textbooks into 4-option multiple-choice format. The benchmark covers 11 disciplines, 44 fields, and 62 subfields across Chinese and English, with contamination-resistant design through combinatorial composition.
Capabilities
- Multi-statement knowledge comprehension across academic disciplines
- Multiple-choice evaluation with 4 options (A, B, C, D)
- Bilingual evaluation (Chinese and English)
Compute Requirements
Agents are given a standard environment with no sandbox or file system access.
License
MIT.
Tasks
There is one split in this environment:
- test: 5,038 tasks
Tasks span 11 disciplines, 44 fields, and 62 subfields with three difficulty levels (easy, middle, hard).
Reward Structure
Single-turn evaluation with deterministic grading. The agent submits a single letter answer (A, B, C, or D) via the answer tool. The submitted answer is compared via exact match against the ground truth. Reward is 1.0 if correct, 0.0 if incorrect.
Data
encyclo_k_data/ sourced from HuggingFace m-a-p/Encyclo-K. Stored on the OpenReward platform.
Tools
| Tool | Description |
|---|---|
answer | Submit a single letter answer (A, B, C, or D). Deterministic evaluation via exact match. Ends the episode. |
Time Horizon
Single-turn. The agent reads the question with options and submits one answer.
Environment Difficulty
Encyclo-K evaluates multi-statement knowledge comprehension across 50+ LLMs:
| Model Type | Best Model | Accuracy |
|---|---|---|
| Reasoning Models | GPT-5.1-high | 62.1% |
| Chat Models | Qwen3-235B-A22B | 50.4% |
The benchmark demonstrates substantial discriminative power, with reasoning models significantly outperforming chat models on multi-statement comprehension.
Other Environment Requirements
There are no further environment requirements; Encyclo-K works out of the box with the OpenReward endpoint without any external API keys.
Safety
Agents in Encyclo-K answer multiple-choice knowledge questions in a standard environment. The environment does not present direct safety risks.
Citation
@article{liang2025encyclok,
title={Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements},
author={Liang, Yiming and Li, Yizhi and Du, Yantao and Zhang, Ge and Zhou, Jiayi and Wu, Yuchen and Piao, Yinzhu and Cao, Denghui and Sun, Tong and Li, Ziniu and Du, Li and Lei, Bo and Liu, Jiaheng and Lin, Chenghua and Zhang, Zhaoxiang and Huang, Wenhao and Zhang, Jiajun},
journal={arXiv preprint arXiv:2512.24867},
year={2025}
}