Artemis2

API Endpoint
Leaderboard
Loading leaderboard...
README

Artemis II

OpenReward Environment

Description

Artemis2 is a multi-step decision environment simulating NASA's Artemis II crewed lunar flyby mission. The agent acts as Flight Director, making Go/No-Go decisions, executing trajectory correction burns, managing spacecraft systems, and responding to anomalies across 10 mission phases spanning 240 hours of mission elapsed time.

All spacecraft parameters are grounded in publicly available NASA mission data for the real Artemis II mission (SLS Block 1 / Orion / European Service Module).

Capabilities

  • Mission-critical Go/No-Go decision-making at multiple gates
  • Trajectory correction burn execution with 3-axis delta-V specification
  • Real-time anomaly diagnosis and response under time pressure
  • Consumable management (propellant, O2, water, power)
  • Communication blackout handling during lunar far-side passage

Compute Requirements

No additional compute requirements beyond the OpenReward platform.

License

MIT

Tasks

There are 5 training tasks and 3 test tasks:

Training:

Task IDScenarioAnomalies
artemis2_nominal_000Nominal missionNone
artemis2_nominal_001Nominal mission (different seed)None
artemis2_rcs_failure_010RCS failureRCS thruster failure
artemis2_o2_anomaly_020O2 anomalyO2 flow sensor anomaly
artemis2_multi_fault_030Multiple faultsStar tracker drift + solar array degradation

Test:

Task IDScenarioAnomalies
artemis2_test_nominal_100Nominal missionNone
artemis2_test_co2_thermal_110CO2 + thermalCO2 scrubber partial + cabin temp excursion
artemis2_test_rcs_comm_120RCS + commRCS thruster failure + comm antenna misalignment

Each task traverses 10 mission phases with ~20-30 decision points per mission.

Reward Structure

This is a dense, verifiable reward environment. Rewards are issued after each decision:

  • Burns: Scored by direction accuracy (cosine similarity) and magnitude accuracy vs. optimal correction. Reward in [0, 1].
  • Go/No-Go: 1.0 for correct decision with full systems verification, 0.8 for correct with incomplete verification, 0.0 for incorrect.
  • System adjustments: Scored by closeness to optimal parameter value. Reward in [0, 1].
  • Anomaly responses: 1.0 for exact correct procedure, partial credit (0.2-0.7) for partially correct, 0.0 for wrong.
  • Phase advancement: 0.1 bonus per phase reached.

No LLM graders are used.

Data

No external data files are required. All mission parameters and anomaly scenarios are procedurally generated from fixed seeds embedded in the environment code.

Tools

ToolDescriptionReward
check_telemetry()View spacecraft state, systems, consumables0.0 (informational)
execute_burn(dv_prograde, dv_normal, dv_radial)Fire ESM main engine[0, 1]
go_no_go(decision, systems_verified)Issue Go/No-Go at decision gates{0, 0.8, 1.0}
adjust_system(system, parameter, value)Configure spacecraft subsystem[0, 1]
respond_to_anomaly(anomaly_id, action, parameters)Handle active anomaly[0, 1]
advance_phase()Move to next mission phase0.1

Time Horizon

Artemis2 is a multi-turn environment with ~20-30 decisions per mission across 10 phases.

Other Environment Requirements

There are no further environment requirements; Artemis II works out of the box with the OpenReward endpoint without any secrets.

Safety

Agents in Artemis2 interact with a simulated spacecraft environment. The environment does not present direct safety risks, as all decisions affect only the simulation state. No real systems, people, or infrastructure are impacted.

Citations

@dataset{GRArtemis2,
  author    = {General Reasoning Inc. Team},
  title     = {Artemis2},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/artemis2}
}
RJT1990/Artemis2 | OpenReward