agent-world-model

API Endpoint
Leaderboard
Loading leaderboard...
Implementation of
README

Agent World Model

⭐ OpenReward Environment

Description

Agent World Model (AWM) is an OpenReward port of the Agent World Model benchmark by Wang et al. (Snowflake AI Research / UNC-Chapel Hill). It provides 1,000 fully synthetic, SQL database-backed tool-use environments for multi-turn agentic reinforcement learning. Each scenario exposes a FastAPI/MCP server with domain-specific tools (e.g., e-commerce, healthcare, education) backed by a SQLite database. Agents must discover available tools, interact with the environment through API calls, and complete user-specified tasks that modify database state.

Capabilities

  • Tool discovery and multi-turn tool calling across diverse domains
  • SQL database state manipulation through generated API endpoints
  • Planning and reasoning over complex, multi-step tasks
  • Adapting to novel tool interfaces without prior training data

Compute Requirements

Each task launches a FastAPI subprocess serving the scenario's generated API. No sandbox or Docker container is required per-task — the environment server manages subprocess lifecycle internally. The server itself can be run in a standard container with 1 CPU and 1 GB RAM.

License

Apache-2.0. The underlying Agent World Model dataset and code are subject to their own license terms.

Tasks

AWM contains over 1,000 scenarios with ~10 tasks each (10,000+ total tasks). Due to the large number of tasks, list_tasks is not implemented — sessions should be created directly with a task_spec containing scenario, task_idx, and task.

Available splits:

  • train: Training tasks
  • test: Held-out evaluation tasks

Each task specifies a scenario (e.g., e_commerce_33), a task index, and a natural language instruction describing what the agent should accomplish within that scenario's environment.

Reward Structure

This is a multi-turn environment with binary reward:

  • 1.0 — Verification code confirms the task was completed successfully (database state matches expected outcome)
  • 0.0 — Verification fails or encounters an error

Verification is performed automatically on submission using generated verification code that inspects database state changes (SQL mode) or evaluates a final answer (code mode).

Data

Data is loaded from JSONL files mounted at /data:

  • gen_envs.jsonl — Generated environment code per scenario
  • gen_tasks.jsonl — Task descriptions per scenario
  • gen_verifier.jsonl — Verification code per task
  • databases/ — Pre-built SQLite databases per scenario

The dataset is derived from the Snowflake/AgentWorldModel-1K collection on HuggingFace.

Tools

ToolDescription
list_scenario_toolsDiscover what API tools are available in the current scenario. Call this first.
call_scenario_toolCall a scenario tool by name with JSON arguments.
submitSubmit when the task is complete. Runs verification and returns reward.

Time Horizon

AWM is a multi-turn environment. Agents first discover available tools, then make a series of API calls to modify database state and complete the task. A typical task may involve 5-20+ tool calls depending on complexity.

Environment Difficulty

Difficulty varies across scenarios and tasks. Some tasks require simple single-step API calls, while others demand multi-step reasoning, data lookups, and chained operations. The synthetic nature of the environments means agents encounter novel tool interfaces not seen in training data.

Safety

Each task runs in an isolated subprocess with its own copy of the SQLite database. No network access to external services is required. The environment does not involve private data or production systems.

Citations

@article{wang2026agentworldmodelinfinity,
      title={Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning},
      author={Zhaoyang Wang and Canwen Xu and Boyi Liu and Yite Wang and Siwei Han and Zhewei Yao and Huaxiu Yao and Yuxiong He},
      year={2026},
      eprint={2602.10090},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.10090},
}
GeneralReasoning/agent-world-model | OpenReward