kumo
Description
KUMO is a generative evaluation framework for assessing reasoning in LLMs that synergistically combines LLMs with symbolic engines to dynamically produce diverse, partially observable, multi-turn reasoning tasks with adjustable difficulty. Through an automated pipeline that continuously generates novel tasks across open-ended domains, KUMO compels models to demonstrate genuine generalization rather than memorization and serves as a contamination-resistant benchmark for long-term evaluation.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |