MicrogridGym

API Endpoint
Leaderboard
Loading leaderboard...
README

MicrogridGym

OpenReward Environment

Description

MicrogridGym is an environment for tuning cascaded PI controllers and droop coefficients for three-phase power electronic inverters in microgrid configurations. Based on the physics from the OpenModelica Microgrid Gym (OMG) toolbox, it implements a pure Python simulation of inverters with LC output filters supplying RL loads.

Capabilities

  • Understanding three-phase AC power electronics and LC filter dynamics
  • Tuning cascaded PI controller gains (voltage and current loops)
  • Configuring droop control for multi-inverter power sharing
  • Adapting controller parameters to load disturbances
  • Reasoning about control stability and tracking performance tradeoffs

Compute Requirements

MicrogridGym runs a lightweight numerical simulation (RK4 ODE integration) and requires minimal compute resources.

License

GPLv3 (matching the original OMG toolbox license).

Tasks

There are 126 tasks across 4 scenario types and 2 splits:

ScenarioDescriptionAgent ActionStepsTrainTest
Voltage ControlSingle inverter, maintain 3-phase output voltage with load step disturbancekP_v, kI_v, kP_i, kI_i10609
Current ControlSingle inverter, track current reference with load step disturbancekP_v, kI_v, kP_i, kI_i8156
Droop ControlTwo parallel inverters, proportional power sharingDroop coefficients + PI gains12126
Load FollowingSingle inverter, time-varying load profilekP_v, kI_v, kP_i, kI_i (per step)12126

Total: 99 train + 27 test = 126 tasks

Each task defines a specific circuit configuration (filter inductance, capacitance, load resistance) and disturbance scenario. Train and test splits use different parameter values.

Reward Structure

This is a dense, verifiable reward environment. Rewards are computed algorithmically at each step, following the OMG reward formulation:

Voltage tracking error (per-phase root-error):

voltage_err=1Ntk=13Vref,kVactual,kVnom\text{voltage\_err} = \frac{1}{N}\sum_{t}\sum_{k=1}^{3} \sqrt{\frac{|V_{\text{ref},k} - V_{\text{actual},k}|}{V_{\text{nom}}}}

Current tracking error (per-phase root-error):

current_err=1Ntk=13iref,kiactual,kilim\text{current\_err} = \frac{1}{N}\sum_{t}\sum_{k=1}^{3} \sqrt{\frac{|i_{\text{ref},k} - i_{\text{actual},k}|}{i_{\text{lim}}}}

Log-barrier current constraint penalty:

barrier=1Ntk=13μln(1max(ikinom,0)iliminom)\text{barrier} = -\frac{1}{N}\sum_{t}\sum_{k=1}^{3} \mu \cdot \ln\left(1 - \frac{\max(|i_k| - i_{\text{nom}},\, 0)}{i_{\text{lim}} - i_{\text{nom}}}\right)

where μ=2\mu = 2, inom=20Ai_{\text{nom}} = 20\text{A}, ilim=30Ai_{\text{lim}} = 30\text{A}.

Step reward (mapped to [0, 1]):

step_reward=exp(voltage_err+current_err+barrier3)\text{step\_reward} = \exp\left(-\frac{\text{voltage\_err} + \text{current\_err} + \text{barrier}}{3}\right)

The episode reward is the mean of all step rewards.

No LLM graders are used.

Data

No external data files are required. All circuit parameters and task definitions are generated programmatically from physically meaningful parameter ranges based on the OMG toolbox defaults.

Tools

Agents are given two tools:

  • set_controller: Set controller parameters (PI gains and/or droop coefficients) and advance the simulation by one control interval (5ms). Returns voltage/current measurements, tracking error, and step reward.
  • info: Reference documentation about the circuit topology, control architecture, parameter ranges, and tuning guidelines.

Time Horizon

Each episode consists of 8-12 tool calls depending on the scenario type. The agent sets controller parameters at each step, observes the resulting voltage/current waveforms, and can adjust gains for the next step.

Other Environment Requirements

There are no further environment requirements; MicrogridGym works out of the box with the OpenReward platform without any secrets.

Safety

Agents in MicrogridGym interact with a numerical simulation of power electronics. The environment does not connect to real hardware or external systems. Unstable controller parameters cause clean simulation termination with zero reward, not real-world damage.

Citations

@article{Heid2020OMG,
  author  = {Stefan Heid and Daniel Weber and Henrik Bode and Eyke H{\"u}llermeier and Oliver Wallscheid},
  title   = {{OMG}: A Scalable and Flexible Simulation and Testing Environment Toolbox for Intelligent Microgrid Control},
  journal = {Journal of Open Source Software},
  volume  = {5},
  number  = {54},
  pages   = {2435},
  year    = {2020},
  doi     = {10.21105/joss.02435}
}
GeneralReasoning/MicrogridGym | OpenReward