ReasoningGym

API Endpoint
Leaderboard
Loading leaderboard...
Implementation of
README

Reasoning-Gym-Envs

OpenReward Environment

Description

Reasoning-Gym-Envs is an environment wrapper for the reasoning-gym Python package, providing 105+ procedurally-generated reasoning datasets as OpenReward environments. It covers 12 categories including algebra, algorithmic problems, ARC variants, arithmetic, code execution, cognition, games, geometry, graphs, induction, logic, and probability.

Capabilities

  • Procedurally-generated reasoning tasks
  • Algorithmic answer verification
  • Multi-category reasoning evaluation
  • Deterministic task generation with seeding

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

Apache 2.0.

Tasks

There is one split per dataset in this environment:

  • train: 500 tasks per dataset (default, configurable)

Datasets span 12 categories:

  • Algebra (6): Complex arithmetic, polynomial equations, integration
  • Algorithmic (34): Ciphers, string manipulation, graph problems
  • ARC (3): Abstraction & Reasoning Corpus variants
  • Arithmetic (18): Basic math, GCD, LCM, prime factorization
  • Code (2): Brainfuck execution, code I/O
  • Cognition (7): Rubik's cube, pattern recognition, ASCII art
  • Games (17): Sudoku, chess puzzles, logic games
  • Geometry (2): Basic and advanced geometric calculations
  • Graphs (5): Shortest path, topological sort, relationships
  • Induction (2): Causal reasoning, function learning
  • Logic (7): Knights & Knaves, propositional logic, syllogisms
  • Probability (1): Coin flips and probability reasoning

Reward Structure

This is a single-turn environment. The agent submits an answer via the submit_answer tool. Verification is algorithmic via reasoning-gym's score_answer() function. Most datasets use exact match scoring (0.0 or 1.0), with some supporting partial credit (e.g., Rubik's cube: 0.0-1.0 based on solution quality).

Data

No external data files required. All tasks are procedurally generated in-memory using deterministic seeding from the reasoning-gym package.

Tools

ToolDescription
submit_answerSubmit your answer for algorithmic verification. Ends the episode.

Time Horizon

Single-turn. The agent reads the reasoning problem and submits one answer.

Environment Difficulty

[Put environment difficulty here]

Other Environment Requirements

None. All evaluation is deterministic and procedurally generated.

Safety

Agents in Reasoning-Gym-Envs solve reasoning problems in a standard environment. The environment does not present direct safety risks.

Citation

@misc{stojanovski2025reasoninggymreasoningenvironments,
  title={REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards},
  author={Zafir Stojanovski and Oliver Stanley and Joe Sharratt and Richard Jones and Abdulhakeem Adefioye and Jean Kaddour and Andreas Köpf},
  year={2025},
  eprint={2505.24760},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2505.24760}
}
GeneralReasoning/ReasoningGym | OpenReward