API Endpoint

Leaderboard

Loading leaderboard...

README

CodePDE

Description

CodePDE is an environment for evaluating an agent's ability to generate numerical solvers for partial differential equations (PDEs). Given a PDE problem description, the agent must implement a correct and efficient Python solver. Tasks cover fundamental PDEs including advection, Burgers' equation, compressible Navier-Stokes, Darcy flow, and reaction-diffusion.

This OpenReward implementation is ported from the Harbor Framework implementation originally made by Shanda Li.

Capabilities

Implementing numerical PDE solvers (finite difference, spectral methods, etc.)
Understanding PDE formulations and boundary conditions
Optimizing solver accuracy and stability
Working with NumPy/SciPy for scientific computing

Compute Requirements

Agents are given a sandboxed environment with bash access, file editing tools, and scientific Python libraries (NumPy, SciPy). Default sandbox size is 1 CPU and 2 GB RAM.

Tasks

There is one split in this environment:

Test: 5 PDE solver tasks
- Advection equation (1D)
- Burgers' equation
- Compressible Navier-Stokes (1D)
- Darcy flow
- Reaction-diffusion (1D)

Each task provides a PDE specification with initial/boundary conditions and requires implementing a solver as a Python function.

Reward Structure

This is a multi-turn, sandbox-based environment. The agent implements a solver, tests it, and calls submit_answer for verification. The verifier compares the agent's numerical solution against reference data using normalized RMSE (nRMSE).

1.0: Solution achieves nRMSE below the task-specific threshold.
0.0: Solution exceeds error threshold or fails to run.

Data

Each task directory contains an instruction.md with the PDE specification, a function template, and a tests/ directory with verification scripts. Task data is stored on the OpenReward platform.

Tools

Tool	Description
`bash`	Execute shell commands in the sandbox.
`str_replace`	Replace a unique string in a file.
`view`	View file contents or list directory contents.
`create_file`	Create a new file with specified content.
`submit_answer`	Submit work for automated verification against reference solutions.

Time Horizon

CodePDE is a multi-turn environment. Agents read the PDE specification, implement a numerical solver, test and debug, and submit for verification.

Environment Difficulty

The original paper evaluates LLMs on PDE solver generation. Single-shot success rates vary by equation type:

PDE Type	Bug-Free Rate	Best nRMSE
Advection	Higher	9.74×10⁻⁴ (o3)
Burgers	Moderate	1.23×10⁻⁴ (Gemini 2.5)
Navier-Stokes	16.6%	1.31×10⁻² (o3)
Reaction-Diffusion	Consistent failure	1.74×10⁻² (DeepSeek-V3)

All models struggle with reaction-diffusion without hints. Debug success rates improve from 41% to 84% after iterative refinement.

Other Environment Requirements

There are no further environment requirements; CodePDE works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in CodePDE implement numerical solvers in a sandboxed environment. The environment does not present direct safety risks.

Citations

@article{li2025codepde,
  author    = {Li, Shanda and Marwah, Tanya and Shen, Junhong and Sun, Weiwei and Risteski, Andrej and Yang, Yiming and Talwalkar, Ameet},
  title     = {CodePDE: An Inference Framework for LLM-driven PDE Solver Generation},
  journal   = {arXiv preprint arXiv:2505.08783},
  year      = {2025},
  url       = {https://arxiv.org/abs/2505.08783}
}

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

codepde

GeneralReasoning/codepde

CodePDE

Description

Capabilities

Compute Requirements

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citations

Tools

Compute Configuration

Estimated Cost

Examples