codepde
CodePDE
Description
CodePDE is an environment for evaluating an agent's ability to generate numerical solvers for partial differential equations (PDEs). Given a PDE problem description, the agent must implement a correct and efficient Python solver. Tasks cover fundamental PDEs including advection, Burgers' equation, compressible Navier-Stokes, Darcy flow, and reaction-diffusion.
This OpenReward implementation is ported from the Harbor Framework implementation originally made by Shanda Li.
Capabilities
- Implementing numerical PDE solvers (finite difference, spectral methods, etc.)
- Understanding PDE formulations and boundary conditions
- Optimizing solver accuracy and stability
- Working with NumPy/SciPy for scientific computing
Compute Requirements
Agents are given a sandboxed environment with bash access, file editing tools, and scientific Python libraries (NumPy, SciPy). Default sandbox size is 1 CPU and 2 GB RAM.
Tasks
There is one split in this environment:
- Test: 5 PDE solver tasks
- Advection equation (1D)
- Burgers' equation
- Compressible Navier-Stokes (1D)
- Darcy flow
- Reaction-diffusion (1D)
Each task provides a PDE specification with initial/boundary conditions and requires implementing a solver as a Python function.
Reward Structure
This is a multi-turn, sandbox-based environment. The agent implements a solver, tests it, and calls submit_answer for verification. The verifier compares the agent's numerical solution against reference data using normalized RMSE (nRMSE).
- 1.0: Solution achieves nRMSE below the task-specific threshold.
- 0.0: Solution exceeds error threshold or fails to run.
Data
Each task directory contains an instruction.md with the PDE specification, a function template, and a tests/ directory with verification scripts. Task data is stored on the OpenReward platform.
Tools
| Tool | Description |
|---|---|
bash | Execute shell commands in the sandbox. |
str_replace | Replace a unique string in a file. |
view | View file contents or list directory contents. |
create_file | Create a new file with specified content. |
submit_answer | Submit work for automated verification against reference solutions. |
Time Horizon
CodePDE is a multi-turn environment. Agents read the PDE specification, implement a numerical solver, test and debug, and submit for verification.
Environment Difficulty
The original paper evaluates LLMs on PDE solver generation. Single-shot success rates vary by equation type:
| PDE Type | Bug-Free Rate | Best nRMSE |
|---|---|---|
| Advection | Higher | 9.74×10⁻⁴ (o3) |
| Burgers | Moderate | 1.23×10⁻⁴ (Gemini 2.5) |
| Navier-Stokes | 16.6% | 1.31×10⁻² (o3) |
| Reaction-Diffusion | Consistent failure | 1.74×10⁻² (DeepSeek-V3) |
All models struggle with reaction-diffusion without hints. Debug success rates improve from 41% to 84% after iterative refinement.
Other Environment Requirements
There are no further environment requirements; CodePDE works out of the box with the OpenReward endpoint without any external API keys.
Safety
Agents in CodePDE implement numerical solvers in a sandboxed environment. The environment does not present direct safety risks.
Citations
@article{li2025codepde,
author = {Li, Shanda and Marwah, Tanya and Shen, Junhong and Sun, Weiwei and Risteski, Andrej and Yang, Yiming and Talwalkar, Ameet},
title = {CodePDE: An Inference Framework for LLM-driven PDE Solver Generation},
journal = {arXiv preprint arXiv:2505.08783},
year = {2025},
url = {https://arxiv.org/abs/2505.08783}
}