ControlEval

API Endpoint
Leaderboard
Loading leaderboard...
README

ControlEval

OpenReward Environment

Description

ControlEval is an environment for evaluating LLM agents on classical control system design. Given a plant transfer function G(s) and performance specifications, agents must design a controller C(s) such that the closed-loop system simultaneously satisfies stability, robustness, and time-domain performance constraints.

Capabilities

  • Transfer function analysis and manipulation
  • Controller design (PI, PID, lead/lag compensators, etc.)
  • Frequency-domain analysis (phase/gain margins)
  • Time-domain performance evaluation (settling time, overshoot, steady-state error)
  • Iterative design refinement based on performance feedback

Compute Requirements

No special compute requirements. Evaluation uses the python-control library for deterministic transfer function computations.

Tasks

500 tasks in a single test split, across 10 categories (50 tasks each):

CategorySystem Type
first_order_stable_fast1st-order stable, fast response
first_order_stable_moderate1st-order stable, moderate response
first_order_stable_slow1st-order stable, slow response
first_order_unstable1st-order unstable
first_order_w_delay1st-order with time delay
second_order_stable_fast2nd-order stable, fast response
second_order_stable_moderate2nd-order stable, moderate response
second_order_stable_slow2nd-order stable, slow response
second_order_unstable2nd-order unstable
higher_order3rd–5th order systems

Each task specifies a plant G(s) via numerator/denominator polynomial coefficients, an optional time delay, and numerical performance constraints.

Reward Structure

Binary (sparse). Reward is 1.0 if ALL of the following constraints are satisfied, 0.0 otherwise:

  1. Stability: all closed-loop poles have real part < -0.01
  2. Phase margin ≥ specified minimum (degrees)
  3. Settling time within specified [min, max] range (2% criterion)
  4. Steady-state error ≤ specified maximum

Tools

ToolDescription
evaluateTest a candidate controller C(s). Returns all performance metrics (stability, margins, settling time, steady-state error) without ending the episode.
submitSubmit a final controller C(s) for grading. Returns metrics and reward. Ends the episode.

Both tools accept controller transfer function coefficients: num (numerator) and den (denominator) as lists of floats in descending powers of s.

Time Horizon

Multi-turn. Agents can call evaluate iteratively to refine their controller design before calling submit. Typical solutions require 1–10 evaluate calls.

Environment Difficulty

Difficulty varies by category. First-order stable systems are easiest; higher-order and unstable systems are hardest. The original ControlAgent paper reports 53–95% success rates across categories using GPT-4 with domain-specific prompting.

Safety

This environment involves mathematical computation only. No safety concerns.

Citations

@article{guo2024controlagent,
  title={ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise},
  author={Guo, Xingang and Keivan, Darioush and Syed, Usman and Qin, Lianhui and Zhang, Huan and Dullerud, Geir and Seiler, Peter and Hu, Bin},
  journal={arXiv preprint arXiv:2410.19811},
  year={2024}
}
GeneralReasoning/ControlEval | OpenReward