API Endpoint

Leaderboard

Loading leaderboard...

Implementation of

arXiv/agentworldmodel

README

Agent World Model

Description

Agent World Model (AWM) is an OpenReward port of the Agent World Model benchmark by Wang et al. (Snowflake AI Research / UNC-Chapel Hill). It provides 1,000 fully synthetic, SQL database-backed tool-use environments for multi-turn agentic reinforcement learning. Each scenario exposes a FastAPI/MCP server with domain-specific tools (e.g., e-commerce, healthcare, education) backed by a SQLite database. Agents must discover available tools, interact with the environment through API calls, and complete user-specified tasks that modify database state.

Capabilities

Tool discovery and multi-turn tool calling across diverse domains
SQL database state manipulation through generated API endpoints
Planning and reasoning over complex, multi-step tasks
Adapting to novel tool interfaces without prior training data

Compute Requirements

Each task launches a FastAPI subprocess serving the scenario's generated API. No sandbox or Docker container is required per-task — the environment server manages subprocess lifecycle internally. The server itself can be run in a standard container with 1 CPU and 1 GB RAM.

License

Apache-2.0. The underlying Agent World Model dataset and code are subject to their own license terms.

Tasks

AWM contains over 1,000 scenarios with ~10 tasks each (10,000+ total tasks). Due to the large number of tasks, list_tasks is not implemented — sessions should be created directly with a task_spec containing scenario, task_idx, and task.

Available splits:

train: Training tasks
test: Held-out evaluation tasks

Each task specifies a scenario (e.g., e_commerce_33), a task index, and a natural language instruction describing what the agent should accomplish within that scenario's environment.

Reward Structure

This is a multi-turn environment with binary reward:

1.0 — Verification code confirms the task was completed successfully (database state matches expected outcome)
0.0 — Verification fails or encounters an error

Verification is performed automatically on submission using generated verification code that inspects database state changes (SQL mode) or evaluates a final answer (code mode).

Data

Data is loaded from JSONL files mounted at /data:

gen_envs.jsonl — Generated environment code per scenario
gen_tasks.jsonl — Task descriptions per scenario
gen_verifier.jsonl — Verification code per task
databases/ — Pre-built SQLite databases per scenario

The dataset is derived from the Snowflake/AgentWorldModel-1K collection on HuggingFace.

Tools

Tool	Description
`list_scenario_tools`	Discover what API tools are available in the current scenario. Call this first.
`call_scenario_tool`	Call a scenario tool by name with JSON arguments.
`submit`	Submit when the task is complete. Runs verification and returns reward.

Time Horizon

AWM is a multi-turn environment. Agents first discover available tools, then make a series of API calls to modify database state and complete the task. A typical task may involve 5-20+ tool calls depending on complexity.

Environment Difficulty

Difficulty varies across scenarios and tasks. Some tasks require simple single-step API calls, while others demand multi-step reasoning, data lookups, and chained operations. The synthetic nature of the environments means agents encounter novel tool interfaces not seen in training data.

Safety

Each task runs in an isolated subprocess with its own copy of the SQLite database. No network access to external services is required. The environment does not involve private data or production systems.

Citations

@article{wang2026agentworldmodelinfinity,
      title={Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning},
      author={Zhaoyang Wang and Canwen Xu and Boyi Liu and Yite Wang and Siwei Han and Zhewei Yao and Huaxiu Yao and Yuxiong He},
      year={2026},
      eprint={2602.10090},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.10090},
}

Repository

Source repository

EnvCommons/agent-world-model

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	2 vCPUs / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000460
Sandbox	Not configured
Total	$0.0000460

Examples

5-minute session$0.0138

1-hour session$0.1656

agent-world-model

GeneralReasoning/agent-world-model

Agent World Model

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Safety

Citations

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples