WildTicTacToe

API Endpoint
Leaderboard
Loading leaderboard...
README

WildTicTacToe

OpenReward Environment

Description

WildTicTacToe is an environment for evaluating agents on tactical gameplay in Wild Tic-Tac-Toe, a variant where players can place either X or O on any empty position. This environment wraps the WildTicTacToe implementation from TextArena, a framework for text-based game environments.

Capabilities

  • Testing strategic flexibility in unconventional game mechanics
  • Evaluating decision-making when mark choice adds complexity to traditional Tic-Tac-Toe
  • Assessing agent ability to exploit and defend against non-standard gameplay patterns
  • Testing tactical planning with expanded move options

Compute Requirements

WildTicTacToe does not require a sandbox. It has minimal compute requirements.

License

MIT.

Tasks

There are two splits: train (50 tasks) and test (50 tasks). Each split contains 50 tasks across each of 1 variants:

  • WildTicTacToe-v0

Each task is seeded for reproducibility.

Reward Structure

This is a sparse reward environment. Rewards are mapped from TextArena's native range of {-1, 0, 1} to {0.0, 0.5, 1.0} via (raw + 1) / 2.

We do not use LLM graders for this environment; reward is determined programmatically.

Data

Game state is generated procedurally by the TextArena engine using seeded randomness. No external data files are required.

Tools

Agents are given a single tool:

  • place_mark(position, mark): Place your mark (X or O) on the board at the given position (0-8). 0=top-left, 4=center, 8=bottom-right.

Time Horizon

WildTicTacToe is a multi-turn environment.

Environment Difficulty

Easy to Medium. While the wild variant adds strategic depth by allowing mark choice, the core Tic-Tac-Toe mechanics remain straightforward. Agents must learn when placing X vs O provides tactical advantage.

Other Environment Requirements

This environment requires an OpenAI API key (passed via secrets) to power the LLM opponent.

Safety

Agents in WildTicTacToe interact only with a board game and have no access to external systems, the internet, or sensitive data. The environment does not present safety risks.

Citations

@software{textarena2024,
  author    = {Guertler, Leon and Banting, Wilfried and Pignatelli, Eduardo},
  title     = {TextArena},
  year      = {2024},
  publisher = {GitHub},
  url       = {https://github.com/LeonGuertler/TextArena}
}
GeneralReasoning/WildTicTacToe | OpenReward