WasmInterpInRust
WasmInterpInRust
A long-horizon OpenReward environment: the agent builds wasmrun from
scratch in /workspace, and the wasm-runner harness grades the built
artifact against a public conformance corpus baked into the sandbox image.
Published as GeneralReasoning/WasmInterpInRust.
See DESIGN.md for the full design and build/PROTOCOL.md for the artifact
invocation + grading contract.
Reward
Per-build delta: +1 per corpus case the current build passes
that the previous build did not, −1 per regression. Reward tracks the current
artifact, so the trajectory total telescopes to the pass count of the build the
agent finishes with. finished=True only when a single build passes every
case at once.
Layout
| File | Role |
|---|---|
wasminterp.py | the WasmInterpInRust(Environment) class, tools, reward |
server.py | Server([WasmInterpInRust]).run() |
Dockerfile | lightweight env-server image (built by OpenReward) |
sandbox.Dockerfile | heavyweight sandbox image (toolchain + corpus + harness); pushed as ghcr.io/generalreasoning/wasminterp:latest |
harness/ | the wasm-runner grading binary (Rust) |
build/filter_corpus.py | clones + filters the corpus at image-build time |
build/PROTOCOL.md | the artifact invocation + grading contract |
golden_tests.py | no-Docker invariant tests (reward telescoping, anti-cheat) |
test_agent.py | end-to-end smoke test against a running server |
reference_solution/ | known-good impl for Phase-4 reward validation |
Develop / test
# No-Docker invariant tests:
uv run python -m pytest golden_tests.py -v
# Local sandbox-image build (iterate / reference-solution check). On push to main
# .github/workflows/build-sandbox.yml builds + pushes :latest + :sha-<gitsha> to
# GHCR; then repin SANDBOX_IMAGE to the new :sha-<gitsha> (NOT :latest — the GHCR
# mirror serves :latest stale). Build + push by hand only as a fallback:
docker build --platform linux/amd64 -f sandbox.Dockerfile -t ghcr.io/generalreasoning/wasminterp:latest .
docker push ghcr.io/generalreasoning/wasminterp:latest
# Local server smoke test:
uv run python server.py # in one shell
uv run python test_agent.py # in anotherCorpus
Built from WebAssembly/testsuite
pinned at a8101597d3c3c660086c3cd1eedee608ff18d3c3 (just before GC/reference-type
syntax was folded into the core MVP files), translated by a pinned wast2json
(wabt 1.0.37). ~22k directive-level cases across six tiers (numeric → control →
variables → memory → tables → decoder/validation). Bump WABT_VERSION /
CORPUS_COMMIT together in both Dockerfiles if either changes.