codepde

API Endpoint
Leaderboard
Loading leaderboard...
README

CodePDE

⭐ OpenReward Environment

Description

CodePDE is an environment for evaluating an agent's ability to generate numerical solvers for partial differential equations (PDEs). Given a PDE problem description, the agent must implement a correct and efficient Python solver. Tasks cover fundamental PDEs including advection, Burgers' equation, compressible Navier-Stokes, Darcy flow, and reaction-diffusion.

This OpenReward implementation is ported from the Harbor Framework implementation originally made by Shanda Li.

Capabilities

  • Implementing numerical PDE solvers (finite difference, spectral methods, etc.)
  • Understanding PDE formulations and boundary conditions
  • Optimizing solver accuracy and stability
  • Working with NumPy/SciPy for scientific computing

Compute Requirements

Agents are given a sandboxed environment with bash access, file editing tools, and scientific Python libraries (NumPy, SciPy). Default sandbox size is 1 CPU and 2 GB RAM.

Tasks

There is one split in this environment:

  • Test: 5 PDE solver tasks
    • Advection equation (1D)
    • Burgers' equation
    • Compressible Navier-Stokes (1D)
    • Darcy flow
    • Reaction-diffusion (1D)

Each task provides a PDE specification with initial/boundary conditions and requires implementing a solver as a Python function.

Reward Structure

This is a multi-turn, sandbox-based environment. The agent implements a solver, tests it, and calls submit_answer for verification. The verifier compares the agent's numerical solution against reference data using normalized RMSE (nRMSE).

  • 1.0: Solution achieves nRMSE below the task-specific threshold.
  • 0.0: Solution exceeds error threshold or fails to run.

Data

Each task directory contains an instruction.md with the PDE specification, a function template, and a tests/ directory with verification scripts. Task data is stored on the OpenReward platform.

Tools

ToolDescription
bashExecute shell commands in the sandbox.
str_replaceReplace a unique string in a file.
viewView file contents or list directory contents.
create_fileCreate a new file with specified content.
submit_answerSubmit work for automated verification against reference solutions.

Time Horizon

CodePDE is a multi-turn environment. Agents read the PDE specification, implement a numerical solver, test and debug, and submit for verification.

Environment Difficulty

The original paper evaluates LLMs on PDE solver generation. Single-shot success rates vary by equation type:

PDE TypeBug-Free RateBest nRMSE
AdvectionHigher9.74×10⁻⁴ (o3)
BurgersModerate1.23×10⁻⁴ (Gemini 2.5)
Navier-Stokes16.6%1.31×10⁻² (o3)
Reaction-DiffusionConsistent failure1.74×10⁻² (DeepSeek-V3)

All models struggle with reaction-diffusion without hints. Debug success rates improve from 41% to 84% after iterative refinement.

Other Environment Requirements

There are no further environment requirements; CodePDE works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in CodePDE implement numerical solvers in a sandboxed environment. The environment does not present direct safety risks.

Citations

@article{li2025codepde,
  author    = {Li, Shanda and Marwah, Tanya and Shen, Junhong and Sun, Weiwei and Risteski, Andrej and Yang, Yiming and Talwalkar, Ameet},
  title     = {CodePDE: An Inference Framework for LLM-driven PDE Solver Generation},
  journal   = {arXiv preprint arXiv:2505.08783},
  year      = {2025},
  url       = {https://arxiv.org/abs/2505.08783}
}
GeneralReasoning/codepde | OpenReward