MicrogridGym
MicrogridGym
Description
MicrogridGym is an environment for tuning cascaded PI controllers and droop coefficients for three-phase power electronic inverters in microgrid configurations. Based on the physics from the OpenModelica Microgrid Gym (OMG) toolbox, it implements a pure Python simulation of inverters with LC output filters supplying RL loads.
Capabilities
- Understanding three-phase AC power electronics and LC filter dynamics
- Tuning cascaded PI controller gains (voltage and current loops)
- Configuring droop control for multi-inverter power sharing
- Adapting controller parameters to load disturbances
- Reasoning about control stability and tracking performance tradeoffs
Compute Requirements
MicrogridGym runs a lightweight numerical simulation (RK4 ODE integration) and requires minimal compute resources.
License
GPLv3 (matching the original OMG toolbox license).
Tasks
There are 126 tasks across 4 scenario types and 2 splits:
| Scenario | Description | Agent Action | Steps | Train | Test |
|---|---|---|---|---|---|
| Voltage Control | Single inverter, maintain 3-phase output voltage with load step disturbance | kP_v, kI_v, kP_i, kI_i | 10 | 60 | 9 |
| Current Control | Single inverter, track current reference with load step disturbance | kP_v, kI_v, kP_i, kI_i | 8 | 15 | 6 |
| Droop Control | Two parallel inverters, proportional power sharing | Droop coefficients + PI gains | 12 | 12 | 6 |
| Load Following | Single inverter, time-varying load profile | kP_v, kI_v, kP_i, kI_i (per step) | 12 | 12 | 6 |
Total: 99 train + 27 test = 126 tasks
Each task defines a specific circuit configuration (filter inductance, capacitance, load resistance) and disturbance scenario. Train and test splits use different parameter values.
Reward Structure
This is a dense, verifiable reward environment. Rewards are computed algorithmically at each step, following the OMG reward formulation:
Voltage tracking error (per-phase root-error):
Current tracking error (per-phase root-error):
Log-barrier current constraint penalty:
where , , .
Step reward (mapped to [0, 1]):
The episode reward is the mean of all step rewards.
No LLM graders are used.
Data
No external data files are required. All circuit parameters and task definitions are generated programmatically from physically meaningful parameter ranges based on the OMG toolbox defaults.
Tools
Agents are given two tools:
set_controller: Set controller parameters (PI gains and/or droop coefficients) and advance the simulation by one control interval (5ms). Returns voltage/current measurements, tracking error, and step reward.info: Reference documentation about the circuit topology, control architecture, parameter ranges, and tuning guidelines.
Time Horizon
Each episode consists of 8-12 tool calls depending on the scenario type. The agent sets controller parameters at each step, observes the resulting voltage/current waveforms, and can adjust gains for the next step.
Other Environment Requirements
There are no further environment requirements; MicrogridGym works out of the box with the OpenReward platform without any secrets.
Safety
Agents in MicrogridGym interact with a numerical simulation of power electronics. The environment does not connect to real hardware or external systems. Unstable controller parameters cause clean simulation termination with zero reward, not real-world damage.
Citations
@article{Heid2020OMG,
author = {Stefan Heid and Daniel Weber and Henrik Bode and Eyke H{\"u}llermeier and Oliver Wallscheid},
title = {{OMG}: A Scalable and Flexible Simulation and Testing Environment Toolbox for Intelligent Microgrid Control},
journal = {Journal of Open Source Software},
volume = {5},
number = {54},
pages = {2435},
year = {2020},
doi = {10.21105/joss.02435}
}