CRUST-Bench

Description

CRUST-Bench is an environment for evaluating C-to-safe-Rust transpilation. Each task provides a C repository and a manually-written safe Rust interface, and the agent must produce a complete Rust implementation that conforms to the interface, compiles without errors, and passes all test cases. Tasks span diverse software categories including compilers, algorithmic libraries, system utilities, and data structures.

This OpenReward implementation is ported from the Harbor Framework implementation originally made by Slimshilin.

Capabilities

Translating C code to idiomatic, safe Rust
Understanding and implementing Rust interfaces with ownership and borrowing
Working with Rust's type system, traits, and lifetime annotations
Debugging compilation errors and test failures

Compute Requirements

Agents are given a sandboxed environment with Rust and C toolchains, bash access, and file editing tools. Default sandbox size is 1 CPU and 2 GB RAM.

License

GPL-3.0.

Tasks

There is one split in this environment:

Test: 100 C-to-Rust transpilation tasks

Tasks are sourced from open-source C repositories on GitHub, covering programming language infrastructure, algorithmic libraries, system utilities, and more. Each task is paired with expert-written Rust interfaces and test suites.

Reward Structure

This is a multi-turn, sandbox-based environment. The agent reads C source code, implements the corresponding Rust code conforming to the provided interface, and calls submit_answer for verification. The verifier runs cargo build and cargo test.

1.0: Code compiles and all tests pass.
0.0: Compilation fails or any test fails.

Data

Each task directory contains an instruction.md with the transpilation task, C source files, Rust interface definitions, and test suites. Task data is stored on the OpenReward platform.

Tools

Tool	Description
`bash`	Execute shell commands in the sandbox (e.g., `cargo build`, `cargo test`).
`str_replace`	Replace a unique string in a file.
`view`	View file contents or list directory contents.
`create_file`	Create a new file with specified content.
`submit_answer`	Submit work for automated compilation and test verification.

Time Horizon

CRUST-Bench is a multi-turn environment. Agents read C source code, study the Rust interface, implement the transpilation, compile and test iteratively, and submit for verification.

Environment Difficulty

CRUST-Bench is challenging. The original paper evaluates frontier models on C-to-Rust transpilation:

Model	One-Shot	With Repair (3 rounds)
o3	19%	48%
Claude Opus 4	22%	40%
o1	15%	37%
Claude 3.7 Sonnet	13%	32%

Even with iterative repair, the best model achieves only 48% success, indicating that safe and idiomatic Rust generation remains challenging.

Other Environment Requirements

There are no further environment requirements; CRUST-Bench works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in CRUST-Bench translate open-source C code to Rust in a sandboxed environment. The environment does not present direct safety risks.

Citations

@article{khatry2025crustbench,
  author    = {Anirudh Khatry and Robert Zhang and Jia Pan and Ziteng Wang and Qiaochu Chen and Greg Durrett and Isil Dillig},
  title     = {CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation},
  journal   = {arXiv preprint arXiv:2504.15254},
  year      = {2025},
  url       = {https://arxiv.org/abs/2504.15254}
}

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

crustbench

GeneralReasoning/crustbench

CRUST-Bench

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citations

Tools

Compute Configuration

Estimated Cost

Examples