Nemotron-RL-ReasoningGym-v1
Nemotron-RL-ReasoningGym
Description
Nemotron-RL-ReasoningGym is a procedural reasoning environment sourced from NVIDIA's Nemotron-RL-ReasoningGym-v1 dataset. It covers 104 distinct task types across 12 categories including logic puzzles, math problems, games (sudoku, sokoban), graph algorithms, string manipulation, cognitive tasks, and family relationship reasoning.
Capabilities
- Solving diverse procedural reasoning tasks
- Logic puzzle solving (sudoku, sokoban, etc.)
- Graph and string algorithm reasoning
- Mathematical problem solving
- Family relationship inference
License
Tasks
This environment uses task indexing for efficient access.
| Split | Tasks |
|---|---|
train | 15,000 |
Each task presents a reasoning problem with a deterministic, algorithmically verifiable answer.
Reward Structure
This is a sparse, verifiable reward environment. The agent receives a reward of 1.0 for an exact string match with the expected answer and 0.0 otherwise. Answers are generated procedurally, ensuring correctness.
No LLM graders are used.
Data
Data is sourced from nvidia/Nemotron-RL-ReasoningGym-v1 on HuggingFace. Tasks are procedurally generated with ground-truth answers.
Tools
| Tool | Description |
|---|---|
submit_answer | Submit your final answer. Ends the episode. |
Time Horizon
This is a single-turn environment. The agent reads the problem and submits one answer.
Other Environment Requirements
No external API keys or secrets are required.
Safety
This environment presents standard reasoning puzzles. There are no safety risks.
Citations
@dataset{nvidia_nemotron_reasoninggym,
author = {NVIDIA},
title = {Nemotron-RL-ReasoningGym-v1},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/nvidia/Nemotron-RL-ReasoningGym-v1}
}No implementations linked yet. Add one to showcase related work.