agent-world-model
Agent World Model
Description
Agent World Model (AWM) is an OpenReward port of the Agent World Model benchmark by Wang et al. (Snowflake AI Research / UNC-Chapel Hill). It provides 1,000 fully synthetic, SQL database-backed tool-use environments for multi-turn agentic reinforcement learning. Each scenario exposes a FastAPI/MCP server with domain-specific tools (e.g., e-commerce, healthcare, education) backed by a SQLite database. Agents must discover available tools, interact with the environment through API calls, and complete user-specified tasks that modify database state.
Capabilities
- Tool discovery and multi-turn tool calling across diverse domains
- SQL database state manipulation through generated API endpoints
- Planning and reasoning over complex, multi-step tasks
- Adapting to novel tool interfaces without prior training data
Compute Requirements
Each task launches a FastAPI subprocess serving the scenario's generated API. No sandbox or Docker container is required per-task — the environment server manages subprocess lifecycle internally. The server itself can be run in a standard container with 1 CPU and 1 GB RAM.
License
Apache-2.0. The underlying Agent World Model dataset and code are subject to their own license terms.
Tasks
AWM contains over 1,000 scenarios with ~10 tasks each (10,000+ total tasks). Due to the large number of tasks, list_tasks is not implemented — sessions should be created directly with a task_spec containing scenario, task_idx, and task.
Available splits:
- train: Training tasks
- test: Held-out evaluation tasks
Each task specifies a scenario (e.g., e_commerce_33), a task index, and a natural language instruction describing what the agent should accomplish within that scenario's environment.
Reward Structure
This is a multi-turn environment with binary reward:
- 1.0 — Verification code confirms the task was completed successfully (database state matches expected outcome)
- 0.0 — Verification fails or encounters an error
Verification is performed automatically on submission using generated verification code that inspects database state changes (SQL mode) or evaluates a final answer (code mode).
Data
Data is loaded from JSONL files mounted at /data:
gen_envs.jsonl— Generated environment code per scenariogen_tasks.jsonl— Task descriptions per scenariogen_verifier.jsonl— Verification code per taskdatabases/— Pre-built SQLite databases per scenario
The dataset is derived from the Snowflake/AgentWorldModel-1K collection on HuggingFace.
Tools
| Tool | Description |
|---|---|
list_scenario_tools | Discover what API tools are available in the current scenario. Call this first. |
call_scenario_tool | Call a scenario tool by name with JSON arguments. |
submit | Submit when the task is complete. Runs verification and returns reward. |
Time Horizon
AWM is a multi-turn environment. Agents first discover available tools, then make a series of API calls to modify database state and complete the task. A typical task may involve 5-20+ tool calls depending on complexity.
Environment Difficulty
Difficulty varies across scenarios and tasks. Some tasks require simple single-step API calls, while others demand multi-step reasoning, data lookups, and chained operations. The synthetic nature of the environments means agents encounter novel tool interfaces not seen in training data.
Safety
Each task runs in an isolated subprocess with its own copy of the SQLite database. No network access to external services is required. The environment does not involve private data or production systems.
Citations
@article{wang2026agentworldmodelinfinity,
title={Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning},
author={Zhaoyang Wang and Canwen Xu and Boyi Liu and Yite Wang and Siwei Han and Zhewei Yao and Huaxiu Yao and Yuxiong He},
year={2026},
eprint={2602.10090},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2602.10090},
}