WasmInterpInRust

A long-horizon OpenReward environment: the agent builds wasmrun from
scratch in /workspace, and the wasm-runner harness grades the built
artifact against a public conformance corpus baked into the sandbox image.
Published as GeneralReasoning/WasmInterpInRust.

See DESIGN.md for the full design and build/PROTOCOL.md for the artifact
invocation + grading contract.

Reward

Per-build delta: +1 per corpus case the current build passes
that the previous build did not, −1 per regression. Reward tracks the current
artifact, so the trajectory total telescopes to the pass count of the build the
agent finishes with. finished=True only when a single build passes every
case at once.

Layout

File	Role
`wasminterp.py`	the `WasmInterpInRust(Environment)` class, tools, reward
`server.py`	`Server([WasmInterpInRust]).run()`
`Dockerfile`	lightweight env-server image (built by OpenReward)
`sandbox.Dockerfile`	heavyweight sandbox image (toolchain + corpus + harness); pushed as `ghcr.io/generalreasoning/wasminterp:latest`
`harness/`	the `wasm-runner` grading binary (Rust)
`build/filter_corpus.py`	clones + filters the corpus at image-build time
`build/PROTOCOL.md`	the artifact invocation + grading contract
`golden_tests.py`	no-Docker invariant tests (reward telescoping, anti-cheat)
`test_agent.py`	end-to-end smoke test against a running server
`reference_solution/`	known-good impl for Phase-4 reward validation

Develop / test

# No-Docker invariant tests:
uv run python -m pytest golden_tests.py -v

# Local sandbox-image build (iterate / reference-solution check). On push to main
# .github/workflows/build-sandbox.yml builds + pushes :latest + :sha-<gitsha> to
# GHCR; then repin SANDBOX_IMAGE to the new :sha-<gitsha> (NOT :latest — the GHCR
# mirror serves :latest stale). Build + push by hand only as a fallback:
docker build --platform linux/amd64 -f sandbox.Dockerfile -t ghcr.io/generalreasoning/wasminterp:latest .
docker push ghcr.io/generalreasoning/wasminterp:latest

# Local server smoke test:
uv run python server.py            # in one shell
uv run python test_agent.py        # in another

Corpus

Built from WebAssembly/testsuite
pinned at a8101597d3c3c660086c3cd1eedee608ff18d3c3 (just before GC/reference-type
syntax was folded into the core MVP files), translated by a pinned wast2json
(wabt 1.0.37). ~22k directive-level cases across six tiers (numeric → control →
variables → memory → tables → decoder/validation). Bump WABT_VERSION /
CORPUS_COMMIT together in both Dockerfiles if either changes.

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152