ReasoningGym
Reasoning-Gym-Envs
Description
Reasoning-Gym-Envs is an environment wrapper for the reasoning-gym Python package, providing 105+ procedurally-generated reasoning datasets as OpenReward environments. It covers 12 categories including algebra, algorithmic problems, ARC variants, arithmetic, code execution, cognition, games, geometry, graphs, induction, logic, and probability.
Capabilities
- Procedurally-generated reasoning tasks
- Algorithmic answer verification
- Multi-category reasoning evaluation
- Deterministic task generation with seeding
Compute Requirements
Agents are given a standard environment with no sandbox or file system access.
License
Tasks
There is one split per dataset in this environment:
- train: 500 tasks per dataset (default, configurable)
Datasets span 12 categories:
- Algebra (6): Complex arithmetic, polynomial equations, integration
- Algorithmic (34): Ciphers, string manipulation, graph problems
- ARC (3): Abstraction & Reasoning Corpus variants
- Arithmetic (18): Basic math, GCD, LCM, prime factorization
- Code (2): Brainfuck execution, code I/O
- Cognition (7): Rubik's cube, pattern recognition, ASCII art
- Games (17): Sudoku, chess puzzles, logic games
- Geometry (2): Basic and advanced geometric calculations
- Graphs (5): Shortest path, topological sort, relationships
- Induction (2): Causal reasoning, function learning
- Logic (7): Knights & Knaves, propositional logic, syllogisms
- Probability (1): Coin flips and probability reasoning
Reward Structure
This is a single-turn environment. The agent submits an answer via the submit_answer tool. Verification is algorithmic via reasoning-gym's score_answer() function. Most datasets use exact match scoring (0.0 or 1.0), with some supporting partial credit (e.g., Rubik's cube: 0.0-1.0 based on solution quality).
Data
No external data files required. All tasks are procedurally generated in-memory using deterministic seeding from the reasoning-gym package.
Tools
| Tool | Description |
|---|---|
submit_answer | Submit your answer for algorithmic verification. Ends the episode. |
Time Horizon
Single-turn. The agent reads the reasoning problem and submits one answer.
Environment Difficulty
[Put environment difficulty here]
Other Environment Requirements
None. All evaluation is deterministic and procedurally generated.
Safety
Agents in Reasoning-Gym-Envs solve reasoning problems in a standard environment. The environment does not present direct safety risks.
Citation
@misc{stojanovski2025reasoninggymreasoningenvironments,
title={REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards},
author={Zafir Stojanovski and Oliver Stanley and Joe Sharratt and Richard Jones and Abdulhakeem Adefioye and Jean Kaddour and Andreas Köpf},
year={2025},
eprint={2505.24760},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.24760}
}