clindef

Description

ClinDEF is a dynamic benchmark for assessing clinical reasoning in LLMs through simulated diagnostic dialogues grounded in a disease knowledge graph. It dynamically generates patient cases and multi-turn interactions between an LLM doctor and an automated patient, and evaluates models with diagnostic accuracy, fine-grained efficiency metrics, and rubric-based assessments of diagnostic quality.

Leaderboard
Loading leaderboard...
Implementations

No implementations linked yet. Add one to showcase related work.

arXiv/clindef | OpenReward