API Endpoint

Leaderboard

Loading leaderboard...

README

ICUSepsis

Description

ICU-Sepsis is an environment for evaluating agents on a tabular Markov Decision Process (MDP) that models sepsis treatment in the intensive care unit. Agents select treatment actions representing combinations of vasopressor and IV fluid doses to maximize patient survival probability. The MDP has 716 discrete states and 25 discrete actions, with transition dynamics derived from the MIMIC-III clinical dataset.

Capabilities

Sequential clinical treatment decision-making under uncertainty
Balancing vasopressor and IV fluid dosing across 25 action combinations
Optimizing sparse binary rewards (survival vs. death)
Reasoning about admissible actions observed in real clinical data
Monitoring patient severity via SOFA scores

Compute Requirements

Minimal. The environment is a tabular MDP with no GPU or significant memory requirements.

License

MIT License (original ICU-Sepsis package).

Tasks

There is one split:

train: 1,000 tasks (seeds 0-999)

Each task uses a unique random seed that determines the initial patient state sampled from the learned initial state distribution. All tasks share the same underlying MDP dynamics (transition probabilities, reward structure).

Reward Structure

This is a sparse, verifiable reward environment. All intermediate steps yield zero reward. Terminal rewards are:

Patient survival (state 714): +1.0
Patient death (state 713): 0.0

The discount factor is 1.0 (undiscounted returns). We do not use LLM graders for this environment.

Data

MDP parameters (transition matrix, reward matrix, initial state distribution, expert policy) ship with the icu-sepsis pip package as a compressed NumPy archive (dynamics.npz). These parameters were derived from the MIMIC-III clinical dataset using the methodology of Komorowski et al. (2018).

Tools

Agents are given two tools:

treat(vasopressor_level, iv_fluid_level): Administer treatment by choosing vasopressor dose (0-4) and IV fluid volume (0-4) independently. Returns the new patient state, SOFA score, list of admissible treatments, and current step count.
info(): Display a reference of the state space, treatment parameters, reward structure, and other environment details.

Time Horizon

ICU-Sepsis is a multi-turn environment. Episodes terminate when the patient reaches a survival or death state, or when the maximum step limit (20 steps) is reached. Based on baseline evaluations from the original paper, episodes typically last 9-11 steps.

Environment Difficulty

From the original paper (Choudhary et al., 2024):

Policy	Avg. Return	Avg. Episode Length
Random	0.78	9.45
Expert (clinician-derived)	0.78	9.22
Optimal (value iteration)	0.88	10.99

The gap between expert/random (~0.78) and optimal (~0.88) performance indicates room for improvement, while the high random baseline reflects that most patients survive regardless of treatment in the underlying data.

Other Environment Requirements

There are no further environment requirements; ICU-Sepsis works out of the box without any secrets or API keys.

Safety

Agents interact only with a tabular MDP simulation derived from anonymized clinical records (MIMIC-III). There is no access to real patient data, external systems, or the internet during task execution.

Citations

@inproceedings{choudhary2024icusepsis,
  title={{ICU-Sepsis}: A Benchmark {MDP} Built from Real Medical Data},
  author={Kartik Choudhary and Dhawal Gupta and Philip S. Thomas},
  booktitle={Reinforcement Learning Conference},
  year={2024},
  url={https://arxiv.org/abs/2406.05646}
}

Repository

Source repository

EnvCommons/icu_sepsis

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

ICUSepsis

GeneralReasoning/ICUSepsis

ICUSepsis

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citations

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples