clindef
Description
ClinDEF is a dynamic benchmark for assessing clinical reasoning in LLMs through simulated diagnostic dialogues grounded in a disease knowledge graph. It dynamically generates patient cases and multi-turn interactions between an LLM doctor and an automated patient, and evaluates models with diagnostic accuracy, fine-grained efficiency metrics, and rubric-based assessments of diagnostic quality.
Leaderboard
Loading leaderboard...
Implementations
No implementations linked yet. Add one to showcase related work.