HealthBench
Description
HealthBench is a rubric-driven benchmark for evaluating LLMs and agentic RAG-based clinical support assistants on their ability to generate high-quality, accurate, situationally aware answers to open-ended clinical questions across behavioral axes such as accuracy, completeness, instruction-following, contextual reasoning, and uncertainty handling. It consists of expert-annotated, open-ended health conversations — including a Hard subset of 1,000 challenging examples — designed for behavior-level, rubric-based scoring.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
1 | 1 months ago |