API Endpoint

Leaderboard

Loading leaderboard...

Implementation of

arXiv/reasoninggym

README

Reasoning-Gym-Envs

Description

Reasoning-Gym-Envs is an environment wrapper for the reasoning-gym Python package, providing 105+ procedurally-generated reasoning datasets as OpenReward environments. It covers 12 categories including algebra, algorithmic problems, ARC variants, arithmetic, code execution, cognition, games, geometry, graphs, induction, logic, and probability.

Capabilities

Procedurally-generated reasoning tasks
Algorithmic answer verification
Multi-category reasoning evaluation
Deterministic task generation with seeding

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

Apache 2.0.

Tasks

There is one split per dataset in this environment:

train: 500 tasks per dataset (default, configurable)

Datasets span 12 categories:

Algebra (6): Complex arithmetic, polynomial equations, integration
Algorithmic (34): Ciphers, string manipulation, graph problems
ARC (3): Abstraction & Reasoning Corpus variants
Arithmetic (18): Basic math, GCD, LCM, prime factorization
Code (2): Brainfuck execution, code I/O
Cognition (7): Rubik's cube, pattern recognition, ASCII art
Games (17): Sudoku, chess puzzles, logic games
Geometry (2): Basic and advanced geometric calculations
Graphs (5): Shortest path, topological sort, relationships
Induction (2): Causal reasoning, function learning
Logic (7): Knights & Knaves, propositional logic, syllogisms
Probability (1): Coin flips and probability reasoning

Reward Structure

This is a single-turn environment. The agent submits an answer via the submit_answer tool. Verification is algorithmic via reasoning-gym's score_answer() function. Most datasets use exact match scoring (0.0 or 1.0), with some supporting partial credit (e.g., Rubik's cube: 0.0-1.0 based on solution quality).

Data

No external data files required. All tasks are procedurally generated in-memory using deterministic seeding from the reasoning-gym package.

Tools

Tool	Description
`submit_answer`	Submit your answer for algorithmic verification. Ends the episode.

Time Horizon

Single-turn. The agent reads the reasoning problem and submits one answer.

Environment Difficulty

[Put environment difficulty here]

Other Environment Requirements

None. All evaluation is deterministic and procedurally generated.

Safety

Agents in Reasoning-Gym-Envs solve reasoning problems in a standard environment. The environment does not present direct safety risks.

Citation

@misc{stojanovski2025reasoninggymreasoningenvironments,
  title={REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards},
  author={Zafir Stojanovski and Oliver Stanley and Joe Sharratt and Richard Jones and Abdulhakeem Adefioye and Jean Kaddour and Andreas Köpf},
  year={2025},
  eprint={2505.24760},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2505.24760}
}

Repository

Source repository

EnvCommons/ReasoningGym

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

ReasoningGym

GeneralReasoning/ReasoningGym

Reasoning-Gym-Envs

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples