ControlEval
ControlEval
Description
ControlEval is an environment for evaluating LLM agents on classical control system design. Given a plant transfer function G(s) and performance specifications, agents must design a controller C(s) such that the closed-loop system simultaneously satisfies stability, robustness, and time-domain performance constraints.
Capabilities
- Transfer function analysis and manipulation
- Controller design (PI, PID, lead/lag compensators, etc.)
- Frequency-domain analysis (phase/gain margins)
- Time-domain performance evaluation (settling time, overshoot, steady-state error)
- Iterative design refinement based on performance feedback
Compute Requirements
No special compute requirements. Evaluation uses the python-control library for deterministic transfer function computations.
Tasks
500 tasks in a single test split, across 10 categories (50 tasks each):
| Category | System Type |
|---|---|
first_order_stable_fast | 1st-order stable, fast response |
first_order_stable_moderate | 1st-order stable, moderate response |
first_order_stable_slow | 1st-order stable, slow response |
first_order_unstable | 1st-order unstable |
first_order_w_delay | 1st-order with time delay |
second_order_stable_fast | 2nd-order stable, fast response |
second_order_stable_moderate | 2nd-order stable, moderate response |
second_order_stable_slow | 2nd-order stable, slow response |
second_order_unstable | 2nd-order unstable |
higher_order | 3rd–5th order systems |
Each task specifies a plant G(s) via numerator/denominator polynomial coefficients, an optional time delay, and numerical performance constraints.
Reward Structure
Binary (sparse). Reward is 1.0 if ALL of the following constraints are satisfied, 0.0 otherwise:
- Stability: all closed-loop poles have real part < -0.01
- Phase margin ≥ specified minimum (degrees)
- Settling time within specified [min, max] range (2% criterion)
- Steady-state error ≤ specified maximum
Tools
| Tool | Description |
|---|---|
evaluate | Test a candidate controller C(s). Returns all performance metrics (stability, margins, settling time, steady-state error) without ending the episode. |
submit | Submit a final controller C(s) for grading. Returns metrics and reward. Ends the episode. |
Both tools accept controller transfer function coefficients: num (numerator) and den (denominator) as lists of floats in descending powers of s.
Time Horizon
Multi-turn. Agents can call evaluate iteratively to refine their controller design before calling submit. Typical solutions require 1–10 evaluate calls.
Environment Difficulty
Difficulty varies by category. First-order stable systems are easiest; higher-order and unstable systems are hardest. The original ControlAgent paper reports 53–95% success rates across categories using GPT-4 with domain-specific prompting.
Safety
This environment involves mathematical computation only. No safety concerns.
Citations
@article{guo2024controlagent,
title={ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise},
author={Guo, Xingang and Keivan, Darioush and Syed, Usman and Qin, Lianhui and Zhang, Huan and Dullerud, Geir and Seiler, Peter and Hu, Bin},
journal={arXiv preprint arXiv:2410.19811},
year={2024}
}