chess

API Endpoint
Leaderboard
Loading leaderboard...
README

Chess

OpenReward Environment

Description

Chess is an environment for evaluating agents on playing chess against Stockfish. Agents play full games of chess by submitting moves in UCI notation. Stockfish responds at a configurable skill level (1-20). The environment provides two sub-environments: ChessTextEnv (FEN text observations) and ChessImageEnv (rendered board image observations). Reward is computed per move using a logistic mapping of Stockfish's centipawn evaluation.

Capabilities

  • Playing chess against an engine at varying difficulty levels
  • Strategic planning and tactical reasoning in chess
  • Understanding FEN notation and UCI move format
  • Multi-turn decision-making in a competitive game setting

Compute Requirements

Chess requires 4 GB RAM and 4 CPUs to run the Stockfish chess engine efficiently. The Stockfish binary must be available on the server.

License

GPL-3.0 (due to Stockfish engine dependency).

Tasks

There is one split: train (40 tasks). Tasks are parameterized by two dimensions:

  • Skill level (1-20): Controls Stockfish's playing strength.
  • Player color (white or black): Determines which side the agent plays.

This gives 20 skill levels x 2 colors = 40 tasks.

Reward Structure

This is a dense reward environment with continuous scoring. After each move, the environment evaluates the board position using Stockfish (depth 8) and maps the centipawn score to a reward in [-1, 1] using a logistic function:

reward=2σ(kcp)1\text{reward} = 2 \cdot \sigma(k \cdot \text{cp}) - 1

where k=0.004k = 0.004 and cp is the centipawn evaluation from the agent's perspective. Mate detection maps to +/-1.0. Invalid moves receive a reward of -1.0.

We do not use LLM graders for this task.

Data

No external data is required. Games are played in real time against the Stockfish engine.

Tools

Agents are given a single tool across both sub-environments:

  • step: Submit a move in UCI format (e.g., "e2e4"). Returns the updated board state (FEN text in ChessTextEnv, board image in ChessImageEnv) after Stockfish responds. The game ends when a checkmate, stalemate, or draw condition is reached.

Time Horizon

Chess is a multi-turn environment. Each task is a full game of chess, with the agent and Stockfish alternating moves until the game ends.

[How many average tool calls?]

Environment Difficulty

[Statistics on environment difficulty here]

Other Environment Requirements

There are no further environment requirements; Chess works out of the box with the OpenReward endpoint.

Safety

Agents in Chess play chess against a Stockfish engine. The environment does not present direct safety risks, as agents only submit chess moves with no access to external systems.

Citations

@dataset{GRChess,
  author    = {General Reasoning Inc. Team},
  title     = {Chess},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/Chess}
}
GeneralReasoning/chess | OpenReward