Nemotron-RL-ReasoningGym

Name: NVIDIA/Nemotron-RL-ReasoningGym-v1
Author: NVIDIA

Description

Nemotron-RL-ReasoningGym is a procedural reasoning environment sourced from NVIDIA's Nemotron-RL-ReasoningGym-v1 dataset. It covers 104 distinct task types across 12 categories including logic puzzles, math problems, games (sudoku, sokoban), graph algorithms, string manipulation, cognitive tasks, and family relationship reasoning.

Capabilities

Solving diverse procedural reasoning tasks
Logic puzzle solving (sudoku, sokoban, etc.)
Graph and string algorithm reasoning
Mathematical problem solving
Family relationship inference

License

CC-BY-4.0.

Tasks

This environment uses task indexing for efficient access.

Split	Tasks
`train`	15,000

Each task presents a reasoning problem with a deterministic, algorithmically verifiable answer.

Reward Structure

This is a sparse, verifiable reward environment. The agent receives a reward of 1.0 for an exact string match with the expected answer and 0.0 otherwise. Answers are generated procedurally, ensuring correctness.

No LLM graders are used.

Data

Data is sourced from nvidia/Nemotron-RL-ReasoningGym-v1 on HuggingFace. Tasks are procedurally generated with ground-truth answers.

Tools

Tool	Description
`submit_answer`	Submit your final answer. Ends the episode.

Time Horizon

This is a single-turn environment. The agent reads the problem and submits one answer.

Other Environment Requirements

No external API keys or secrets are required.

Safety

This environment presents standard reasoning puzzles. There are no safety risks.

Citations

@dataset{nvidia_nemotron_reasoninggym,
  author    = {NVIDIA},
  title     = {Nemotron-RL-ReasoningGym-v1},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/nvidia/Nemotron-RL-ReasoningGym-v1}
}

Implementations

No implementations linked yet. Add one to showcase related work.

Repository

Source repository

EnvCommons/Nemotron-RL-ReasoningGym-v1

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152