API Endpoint

Leaderboard

Loading leaderboard...

README

ControlEval

Description

ControlEval is an environment for evaluating LLM agents on classical control system design. Given a plant transfer function G(s) and performance specifications, agents must design a controller C(s) such that the closed-loop system simultaneously satisfies stability, robustness, and time-domain performance constraints.

Capabilities

Transfer function analysis and manipulation
Controller design (PI, PID, lead/lag compensators, etc.)
Frequency-domain analysis (phase/gain margins)
Time-domain performance evaluation (settling time, overshoot, steady-state error)
Iterative design refinement based on performance feedback

Compute Requirements

No special compute requirements. Evaluation uses the python-control library for deterministic transfer function computations.

Tasks

500 tasks in a single test split, across 10 categories (50 tasks each):

Category	System Type
`first_order_stable_fast`	1st-order stable, fast response
`first_order_stable_moderate`	1st-order stable, moderate response
`first_order_stable_slow`	1st-order stable, slow response
`first_order_unstable`	1st-order unstable
`first_order_w_delay`	1st-order with time delay
`second_order_stable_fast`	2nd-order stable, fast response
`second_order_stable_moderate`	2nd-order stable, moderate response
`second_order_stable_slow`	2nd-order stable, slow response
`second_order_unstable`	2nd-order unstable
`higher_order`	3rd–5th order systems

Each task specifies a plant G(s) via numerator/denominator polynomial coefficients, an optional time delay, and numerical performance constraints.

Reward Structure

Binary (sparse). Reward is 1.0 if ALL of the following constraints are satisfied, 0.0 otherwise:

Stability: all closed-loop poles have real part < -0.01
Phase margin ≥ specified minimum (degrees)
Settling time within specified [min, max] range (2% criterion)
Steady-state error ≤ specified maximum

Tools

Tool	Description
`evaluate`	Test a candidate controller C(s). Returns all performance metrics (stability, margins, settling time, steady-state error) without ending the episode.
`submit`	Submit a final controller C(s) for grading. Returns metrics and reward. Ends the episode.

Both tools accept controller transfer function coefficients: num (numerator) and den (denominator) as lists of floats in descending powers of s.

Time Horizon

Multi-turn. Agents can call evaluate iteratively to refine their controller design before calling submit. Typical solutions require 1–10 evaluate calls.

Environment Difficulty

Difficulty varies by category. First-order stable systems are easiest; higher-order and unstable systems are hardest. The original ControlAgent paper reports 53–95% success rates across categories using GPT-4 with domain-specific prompting.

Safety

This environment involves mathematical computation only. No safety concerns.

Citations

@article{guo2024controlagent,
  title={ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise},
  author={Guo, Xingang and Keivan, Darioush and Syed, Usman and Qin, Lianhui and Zhang, Huan and Dullerud, Geir and Seiler, Peter and Hu, Bin},
  journal={arXiv preprint arXiv:2410.19811},
  year={2024}
}

Repository

Source repository

EnvCommons/ControlEval

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

ControlEval

GeneralReasoning/ControlEval

ControlEval

Description

Capabilities

Compute Requirements

Tasks

Reward Structure

Tools

Time Horizon

Environment Difficulty

Safety

Citations

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples