chess
Chess
Description
Chess is an environment for evaluating agents on playing chess against Stockfish. Agents play full games of chess by submitting moves in UCI notation. Stockfish responds at a configurable skill level (1-20). The environment provides two sub-environments: ChessTextEnv (FEN text observations) and ChessImageEnv (rendered board image observations). Reward is computed per move using a logistic mapping of Stockfish's centipawn evaluation.
Capabilities
- Playing chess against an engine at varying difficulty levels
- Strategic planning and tactical reasoning in chess
- Understanding FEN notation and UCI move format
- Multi-turn decision-making in a competitive game setting
Compute Requirements
Chess requires 4 GB RAM and 4 CPUs to run the Stockfish chess engine efficiently. The Stockfish binary must be available on the server.
License
GPL-3.0 (due to Stockfish engine dependency).
Tasks
There is one split: train (40 tasks). Tasks are parameterized by two dimensions:
- Skill level (1-20): Controls Stockfish's playing strength.
- Player color (white or black): Determines which side the agent plays.
This gives 20 skill levels x 2 colors = 40 tasks.
Reward Structure
This is a dense reward environment with continuous scoring. After each move, the environment evaluates the board position using Stockfish (depth 8) and maps the centipawn score to a reward in [-1, 1] using a logistic function:
where and cp is the centipawn evaluation from the agent's perspective. Mate detection maps to +/-1.0. Invalid moves receive a reward of -1.0.
We do not use LLM graders for this task.
Data
No external data is required. Games are played in real time against the Stockfish engine.
Tools
Agents are given a single tool across both sub-environments:
step: Submit a move in UCI format (e.g., "e2e4"). Returns the updated board state (FEN text in ChessTextEnv, board image in ChessImageEnv) after Stockfish responds. The game ends when a checkmate, stalemate, or draw condition is reached.
Time Horizon
Chess is a multi-turn environment. Each task is a full game of chess, with the agent and Stockfish alternating moves until the game ends.
[How many average tool calls?]
Environment Difficulty
[Statistics on environment difficulty here]
Other Environment Requirements
There are no further environment requirements; Chess works out of the box with the OpenReward endpoint.
Safety
Agents in Chess play chess against a Stockfish engine. The environment does not present direct safety risks, as agents only submit chess moves with no access to external systems.
Citations
@dataset{GRChess,
author = {General Reasoning Inc. Team},
title = {Chess},
year = {2026},
publisher = {OpenReward},
url = {https://openreward.ai/GeneralReasoning/Chess}
}