HealthBench

Name: OpenAI/HealthBench
Author: OpenAI

OpenAI/HealthBench

Description

HealthBench is a rubric-driven benchmark for evaluating LLMs and agentic RAG-based clinical support assistants on their ability to generate high-quality, accurate, situationally aware answers to open-ended clinical questions across behavioral axes such as accuracy, completeness, instruction-following, contextual reasoning, and uncertainty handling. It consists of expert-annotated, open-ended health conversations — including a Hard subset of 1,000 challenging examples — designed for behavior-level, rubric-based scoring.

arXiv

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
GeneralReasoning/HealthBench	1	3 months ago