API Endpoint

Leaderboard

Loading leaderboard...

Implementation of

arXiv/encyclo-k

README

Encyclo-K

Description

Encyclo-K is an environment for evaluating multi-statement knowledge comprehension using dynamically composed questions. Each question aggregates 8-10 knowledge statements from authoritative textbooks into 4-option multiple-choice format. The benchmark covers 11 disciplines, 44 fields, and 62 subfields across Chinese and English, with contamination-resistant design through combinatorial composition.

Capabilities

Multi-statement knowledge comprehension across academic disciplines
Multiple-choice evaluation with 4 options (A, B, C, D)
Bilingual evaluation (Chinese and English)

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

MIT.

Tasks

There is one split in this environment:

test: 5,038 tasks

Tasks span 11 disciplines, 44 fields, and 62 subfields with three difficulty levels (easy, middle, hard).

Reward Structure

Single-turn evaluation with deterministic grading. The agent submits a single letter answer (A, B, C, or D) via the answer tool. The submitted answer is compared via exact match against the ground truth. Reward is 1.0 if correct, 0.0 if incorrect.

Data

encyclo_k_data/ sourced from HuggingFace m-a-p/Encyclo-K. Stored on the OpenReward platform.

Tools

Tool	Description
`answer`	Submit a single letter answer (A, B, C, or D). Deterministic evaluation via exact match. Ends the episode.

Time Horizon

Single-turn. The agent reads the question with options and submits one answer.

Environment Difficulty

Encyclo-K evaluates multi-statement knowledge comprehension across 50+ LLMs:

Model Type	Best Model	Accuracy
Reasoning Models	GPT-5.1-high	62.1%
Chat Models	Qwen3-235B-A22B	50.4%

The benchmark demonstrates substantial discriminative power, with reasoning models significantly outperforming chat models on multi-statement comprehension.

Other Environment Requirements

There are no further environment requirements; Encyclo-K works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in Encyclo-K answer multiple-choice knowledge questions in a standard environment. The environment does not present direct safety risks.

Citation

@article{liang2025encyclok,
  title={Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements},
  author={Liang, Yiming and Li, Yizhi and Du, Yantao and Zhang, Ge and Zhou, Jiayi and Wu, Yuchen and Piao, Yinzhu and Cao, Denghui and Sun, Tong and Li, Ziniu and Du, Li and Lei, Bo and Liu, Jiaheng and Lin, Chenghua and Zhang, Zhaoxiang and Huang, Wenhao and Zhang, Jiajun},
  journal={arXiv preprint arXiv:2512.24867},
  year={2025}
}

Repository

Source repository

EnvCommons/Encyclo-K

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

Encyclo-K

GeneralReasoning/Encyclo-K

Encyclo-K

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples