medr-bench

Description

MedR-Bench is a benchmark for evaluating reasoning-enhanced LLMs in clinical settings, comprising 1,453 structured patient cases across 13 body systems and 10 specialties annotated with reasoning references derived from clinical case reports. It evaluates the full patient care journey—examination recommendation, diagnostic decision-making, and treatment planning—and includes a novel automated Reasoning Evaluator that scores free-text reasoning on efficiency, actuality, and completeness.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
PengchengPengcheng/MedR-Bench
1
2 months ago
arXiv/medr-bench | OpenReward