encyclo-k
Description
Encyclo-K is a statement-based benchmark for evaluating LLMs' comprehensive understanding by extracting standalone knowledge statements from authoritative textbooks and dynamically composing them into evaluation questions at test time. This design (random sampling, 8–10 statements per question) prevents data contamination, enables multi-knowledge-point assessment, reduces annotation cost by requiring only formatting checks, and yields stable, discriminative model rankings.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |