medcogeval

Description

A multi-cognitive-level evaluation framework inspired by Bloom’s Taxonomy for assessing large language models in the medical domain across three cognitive levels—preliminary knowledge grasp, comprehensive knowledge application, and scenario-based problem solving. It integrates existing medical datasets into targeted tasks and is used to systematically evaluate state-of-the-art general and medical LLMs across six model families, revealing sharp performance declines with increasing cognitive complexity and a growing importance of model size at higher levels.

Leaderboard
Loading leaderboard...
Implementations

No implementations linked yet. Add one to showcase related work.

arXiv/medcogeval | OpenReward