AirlineRM

Description

AirlineRM is a hyper-realistic airline network revenue management environment where an agent operates a hub-and-spoke carrier over a 30-day horizon. The agent makes daily decisions about fare class availability (opening/closing 8 nested fare buckets), overbooking limits, and disruption response (weather, mechanical failures, crew shortages, competitor fare wars). The simulation is calibrated against real-world airline industry parameters including DOT-regulated denied boarding compensation, FAR Part 117 crew legality constraints, and empirical booking curve shapes.

Note: this is a synthetic environment that should be tested thoroughly before use in an RL pipeline.

Capabilities

Dynamic fare class management across an 8-class nested inventory (Y/B/M/H/Q/V/T/L)
Overbooking optimization balancing fill rates against denied boarding costs
Disruption management: weather events, mechanical failures, crew shortages
Aircraft swap decisions under range and capacity constraints
Competitor fare war response and demand adaptation
Multi-day strategy development with dense per-day reward feedback
Long-horizon multi-turn execution (30+ advance_day calls plus analysis)

Compute Requirements

Agents in AirlineRM are given a sandbox with 1 GB of RAM and 0.5 CPUs running a Python 3.12 data science image. Network access is enabled.

License

ORLv1.

Tasks

There are 12 tasks across 4 seasonal scenarios:

Training (9 tasks):

summer_peak_v{1,2,3}: Peak summer travel with high leisure demand and thunderstorm risk (demand multiplier 1.30x)
winter_holiday_v{1,2,3}: Holiday season with mixed demand and snowstorm risk (demand multiplier 1.15x)
shoulder_spring_v{1,2,3}: Shoulder season with moderate demand and competitor pressure (demand multiplier 0.90x)

Test (3 tasks):

fall_business_v{1,2,3}: Fall business travel season with conference demand and occasional fog (demand multiplier 1.05x)

Each task simulates 30 operating days across a 12-route hub-and-spoke network with 29 daily departures serving business, mixed, and leisure markets using a fleet of E175 (76 seats), 737-700 (138 seats), and 737-800 (175 seats) aircraft. Variants within the same scenario share the same demand parameters but have different random seeds producing different disruption patterns and booking flows.

Reward Structure

This is a dense, verifiable reward environment. Rewards are computed after each operating day (via the advance_day tool). The agent is rewarded for generating higher net revenue through effective fare management, overbooking decisions, and disruption handling. Poor financial outcomes — such as denied boardings, flight cancellations, and excessive delays — reduce the reward. A naive baseline policy that opens all fare classes without strategic management and never overbooks is used as a reference point.

We do not use LLM graders for this environment.

Data

All data is synthetically generated and deterministic given the task seed. No external datasets are required. The simulation generates:

Network topology (12 routes with realistic distances, frequencies, and fare structures)
Demand via parametric booking curves calibrated to industry data (business routes book late, leisure routes book early)
Disruptions via seasonal stochastic models (thunderstorms in summer, snowstorms in winter, mechanical failures year-round)
Competitor fare wars with demand reduction effects

Agents are given access to a sandbox filesystem where they can write analysis scripts and build models.

Simulation parameters are calibrated against: BTS LOADFACTORD (pre-pandemic domestic load factor averages), DOT 14 CFR 250.5 (denied boarding compensation tiers), Smith, Leimkuhler & Darrow 1992 (no-show rates by fare class), Belobaba 1989 (booking curve shapes), FAA/DOT Form 41 Schedule P-5.2 (aircraft operating costs), and Gallego & van Ryzin 1994 (willingness-to-pay models).

Tools

Agents have access to 6 domain-specific tools and 9 CLI tools:

Domain-specific tools:

Tool	Description
`view_network_status`	Current day status: flights, bookings, disruptions, performance metrics
`view_flight_details`	Detailed view of a specific flight's bookings by fare class
`set_fare_availability`	Open/close fare classes for upcoming flights
`set_overbooking_limit`	Set overbooking authorization (0 to 15% of capacity)
`handle_disruption`	Respond to IRROPS: cancel, delay, swap aircraft, or do nothing
`advance_day`	Process departures, bookings, and disruptions; advance to next day

CLI tools: bash, glob, grep, ls, read, write, edit, multi_edit, todo_write

Time Horizon

AirlineRM is a multi-turn environment requiring at least 30 advance_day calls (one per operating day), plus additional calls for viewing network status, adjusting fares, setting overbooking limits, and handling disruptions.

Environment Difficulty

A simple heuristic policy (closing discount fare classes near departure, 7% overbooking, smart disruption swaps) achieves positive cumulative reward across all four scenarios, demonstrating that the environment is solvable. However, performance varies significantly by scenario, indicating meaningful strategic differentiation is required:

Scenario	Heuristic Reward	Baseline 30-day Net
summer_peak	+6.3	~$3.3M
winter_holiday	+14.0	~$2.8M
shoulder_spring	+0.2	~$2.3M
fall_business	+0.02	~$2.7M

A do-nothing agent (replicating baseline behavior) scores exactly 0.0. A deliberately destructive policy (closing all fares, cancelling all flights) scores below -5.0, confirming the reward signal correctly differentiates agent quality.

Other Environment Requirements

There are no further environment requirements; AirlineRM works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in AirlineRM are told to maximize net revenue through revenue management decisions. The environment does not present direct safety risks as agents interact only with a synthetic simulation. No real airline operations, financial transactions, or passenger data are involved. All operations are sandboxed.

There may be indirect risks in that an agent optimizing purely for revenue may learn aggressive overbooking or cancellation strategies that would be ethically problematic in real operations. The cost structure is calibrated to penalize such behaviour (denied boarding costs $775/passenger), but multi-environment training should include environments that reinforce ethical decision-making.

Citations

@dataset{GRAirlineRM,
  author    = {General Reasoning Inc. Team},
  title     = {AirlineRM: Airline Network Revenue Management Environment},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/airlinerm}
}

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	0.5 vCPUs / 1 GB RAM

Component	Cost / second
Environment	$0.0000320
Sandbox	$0.0000115
Total	$0.0000435

AirlineRM

GeneralReasoning/AirlineRM

AirlineRM

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citations

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples