Nemotron-RL-ReasoningGym-v1

API Endpoint
Leaderboard
Loading leaderboard...
README

Nemotron-RL-ReasoningGym

OpenReward Environment Hugging Face Dataset

Description

Nemotron-RL-ReasoningGym is a procedural reasoning environment sourced from NVIDIA's Nemotron-RL-ReasoningGym-v1 dataset. It covers 104 distinct task types across 12 categories including logic puzzles, math problems, games (sudoku, sokoban), graph algorithms, string manipulation, cognitive tasks, and family relationship reasoning.

Capabilities

  • Solving diverse procedural reasoning tasks
  • Logic puzzle solving (sudoku, sokoban, etc.)
  • Graph and string algorithm reasoning
  • Mathematical problem solving
  • Family relationship inference

License

CC-BY-4.0.

Tasks

This environment uses task indexing for efficient access.

SplitTasks
train15,000

Each task presents a reasoning problem with a deterministic, algorithmically verifiable answer.

Reward Structure

This is a sparse, verifiable reward environment. The agent receives a reward of 1.0 for an exact string match with the expected answer and 0.0 otherwise. Answers are generated procedurally, ensuring correctness.

No LLM graders are used.

Data

Data is sourced from nvidia/Nemotron-RL-ReasoningGym-v1 on HuggingFace. Tasks are procedurally generated with ground-truth answers.

Tools

ToolDescription
submit_answerSubmit your final answer. Ends the episode.

Time Horizon

This is a single-turn environment. The agent reads the problem and submits one answer.

Other Environment Requirements

No external API keys or secrets are required.

Safety

This environment presents standard reasoning puzzles. There are no safety risks.

Citations

@dataset{nvidia_nemotron_reasoninggym,
  author    = {NVIDIA},
  title     = {Nemotron-RL-ReasoningGym-v1},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/nvidia/Nemotron-RL-ReasoningGym-v1}
}
Implementations

No implementations linked yet. Add one to showcase related work.

NVIDIA/Nemotron-RL-ReasoningGym-v1 | OpenReward