Encyclo-K

API Endpoint
Leaderboard
Loading leaderboard...
Implementation of
README

Encyclo-K

OpenReward Environment Hugging Face Dataset

Description

Encyclo-K is an environment for evaluating multi-statement knowledge comprehension using dynamically composed questions. Each question aggregates 8-10 knowledge statements from authoritative textbooks into 4-option multiple-choice format. The benchmark covers 11 disciplines, 44 fields, and 62 subfields across Chinese and English, with contamination-resistant design through combinatorial composition.

Capabilities

  • Multi-statement knowledge comprehension across academic disciplines
  • Multiple-choice evaluation with 4 options (A, B, C, D)
  • Bilingual evaluation (Chinese and English)

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

MIT.

Tasks

There is one split in this environment:

  • test: 5,038 tasks

Tasks span 11 disciplines, 44 fields, and 62 subfields with three difficulty levels (easy, middle, hard).

Reward Structure

Single-turn evaluation with deterministic grading. The agent submits a single letter answer (A, B, C, or D) via the answer tool. The submitted answer is compared via exact match against the ground truth. Reward is 1.0 if correct, 0.0 if incorrect.

Data

encyclo_k_data/ sourced from HuggingFace m-a-p/Encyclo-K. Stored on the OpenReward platform.

Tools

ToolDescription
answerSubmit a single letter answer (A, B, C, or D). Deterministic evaluation via exact match. Ends the episode.

Time Horizon

Single-turn. The agent reads the question with options and submits one answer.

Environment Difficulty

Encyclo-K evaluates multi-statement knowledge comprehension across 50+ LLMs:

Model TypeBest ModelAccuracy
Reasoning ModelsGPT-5.1-high62.1%
Chat ModelsQwen3-235B-A22B50.4%

The benchmark demonstrates substantial discriminative power, with reasoning models significantly outperforming chat models on multi-statement comprehension.

Other Environment Requirements

There are no further environment requirements; Encyclo-K works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in Encyclo-K answer multiple-choice knowledge questions in a standard environment. The environment does not present direct safety risks.

Citation

@article{liang2025encyclok,
  title={Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements},
  author={Liang, Yiming and Li, Yizhi and Du, Yantao and Zhang, Ge and Zhou, Jiayi and Wu, Yuchen and Piao, Yinzhu and Cao, Denghui and Sun, Tong and Li, Ziniu and Du, Li and Lei, Bo and Liu, Jiaheng and Lin, Chenghua and Zhang, Zhaoxiang and Huang, Wenhao and Zhang, Jiajun},
  journal={arXiv preprint arXiv:2512.24867},
  year={2025}
}
GeneralReasoning/Encyclo-K | OpenReward