WasmInterpInRust

API Endpoint
Leaderboard
Loading leaderboard...
README

WasmInterpInRust

A long-horizon OpenReward environment: the agent builds wasmrun from
scratch in /workspace, and the wasm-runner harness grades the built
artifact against a public conformance corpus baked into the sandbox image.
Published as GeneralReasoning/WasmInterpInRust.

See DESIGN.md for the full design and build/PROTOCOL.md for the artifact
invocation + grading contract.

Reward

Per-build delta: +1 per corpus case the current build passes
that the previous build did not, −1 per regression. Reward tracks the current
artifact, so the trajectory total telescopes to the pass count of the build the
agent finishes with. finished=True only when a single build passes every
case at once.

Layout

FileRole
wasminterp.pythe WasmInterpInRust(Environment) class, tools, reward
server.pyServer([WasmInterpInRust]).run()
Dockerfilelightweight env-server image (built by OpenReward)
sandbox.Dockerfileheavyweight sandbox image (toolchain + corpus + harness); pushed as ghcr.io/generalreasoning/wasminterp:latest
harness/the wasm-runner grading binary (Rust)
build/filter_corpus.pyclones + filters the corpus at image-build time
build/PROTOCOL.mdthe artifact invocation + grading contract
golden_tests.pyno-Docker invariant tests (reward telescoping, anti-cheat)
test_agent.pyend-to-end smoke test against a running server
reference_solution/known-good impl for Phase-4 reward validation

Develop / test

# No-Docker invariant tests:
uv run python -m pytest golden_tests.py -v

# Local sandbox-image build (iterate / reference-solution check). On push to main
# .github/workflows/build-sandbox.yml builds + pushes :latest + :sha-<gitsha> to
# GHCR; then repin SANDBOX_IMAGE to the new :sha-<gitsha> (NOT :latest — the GHCR
# mirror serves :latest stale). Build + push by hand only as a fallback:
docker build --platform linux/amd64 -f sandbox.Dockerfile -t ghcr.io/generalreasoning/wasminterp:latest .
docker push ghcr.io/generalreasoning/wasminterp:latest

# Local server smoke test:
uv run python server.py            # in one shell
uv run python test_agent.py        # in another

Corpus

Built from WebAssembly/testsuite
pinned at a8101597d3c3c660086c3cd1eedee608ff18d3c3 (just before GC/reference-type
syntax was folded into the core MVP files), translated by a pinned wast2json
(wabt 1.0.37). ~22k directive-level cases across six tiers (numeric → control →
variables → memory → tables → decoder/validation). Bump WABT_VERSION /
CORPUS_COMMIT together in both Dockerfiles if either changes.

GeneralReasoning/WasmInterpInRust | OpenReward