Artemis2
Artemis II
Description
Artemis2 is a multi-step decision environment simulating NASA's Artemis II crewed lunar flyby mission. The agent acts as Flight Director, making Go/No-Go decisions, executing trajectory correction burns, managing spacecraft systems, and responding to anomalies across 10 mission phases spanning 240 hours of mission elapsed time.
All spacecraft parameters are grounded in publicly available NASA mission data for the real Artemis II mission (SLS Block 1 / Orion / European Service Module).
Capabilities
- Mission-critical Go/No-Go decision-making at multiple gates
- Trajectory correction burn execution with 3-axis delta-V specification
- Real-time anomaly diagnosis and response under time pressure
- Consumable management (propellant, O2, water, power)
- Communication blackout handling during lunar far-side passage
Compute Requirements
No additional compute requirements beyond the OpenReward platform.
License
MIT
Tasks
There are 5 training tasks and 3 test tasks:
Training:
| Task ID | Scenario | Anomalies |
|---|---|---|
artemis2_nominal_000 | Nominal mission | None |
artemis2_nominal_001 | Nominal mission (different seed) | None |
artemis2_rcs_failure_010 | RCS failure | RCS thruster failure |
artemis2_o2_anomaly_020 | O2 anomaly | O2 flow sensor anomaly |
artemis2_multi_fault_030 | Multiple faults | Star tracker drift + solar array degradation |
Test:
| Task ID | Scenario | Anomalies |
|---|---|---|
artemis2_test_nominal_100 | Nominal mission | None |
artemis2_test_co2_thermal_110 | CO2 + thermal | CO2 scrubber partial + cabin temp excursion |
artemis2_test_rcs_comm_120 | RCS + comm | RCS thruster failure + comm antenna misalignment |
Each task traverses 10 mission phases with ~20-30 decision points per mission.
Reward Structure
This is a dense, verifiable reward environment. Rewards are issued after each decision:
- Burns: Scored by direction accuracy (cosine similarity) and magnitude accuracy vs. optimal correction. Reward in [0, 1].
- Go/No-Go: 1.0 for correct decision with full systems verification, 0.8 for correct with incomplete verification, 0.0 for incorrect.
- System adjustments: Scored by closeness to optimal parameter value. Reward in [0, 1].
- Anomaly responses: 1.0 for exact correct procedure, partial credit (0.2-0.7) for partially correct, 0.0 for wrong.
- Phase advancement: 0.1 bonus per phase reached.
No LLM graders are used.
Data
No external data files are required. All mission parameters and anomaly scenarios are procedurally generated from fixed seeds embedded in the environment code.
Tools
| Tool | Description | Reward |
|---|---|---|
check_telemetry() | View spacecraft state, systems, consumables | 0.0 (informational) |
execute_burn(dv_prograde, dv_normal, dv_radial) | Fire ESM main engine | [0, 1] |
go_no_go(decision, systems_verified) | Issue Go/No-Go at decision gates | {0, 0.8, 1.0} |
adjust_system(system, parameter, value) | Configure spacecraft subsystem | [0, 1] |
respond_to_anomaly(anomaly_id, action, parameters) | Handle active anomaly | [0, 1] |
advance_phase() | Move to next mission phase | 0.1 |
Time Horizon
Artemis2 is a multi-turn environment with ~20-30 decisions per mission across 10 phases.
Other Environment Requirements
There are no further environment requirements; Artemis II works out of the box with the OpenReward endpoint without any secrets.
Safety
Agents in Artemis2 interact with a simulated spacecraft environment. The environment does not present direct safety risks, as all decisions affect only the simulation state. No real systems, people, or infrastructure are impacted.
Citations
@dataset{GRArtemis2,
author = {General Reasoning Inc. Team},
title = {Artemis2},
year = {2026},
publisher = {OpenReward},
url = {https://openreward.ai/GeneralReasoning/artemis2}
}