Nemotron-RL-coding-competitive_coding

API Endpoint
Leaderboard
Loading leaderboard...
README

Nemotron Competitive Coding

OpenReward Environment Hugging Face Dataset

Description

Nemotron Competitive Coding is an environment for evaluating agents on competitive programming problems. It wraps the Nemotron-RL-coding-competitive_coding dataset from NVIDIA, consisting of 16,083 competitive coding problems in Python sourced from CodeContests (DeepMind) and Codeforces (Open-R1). Each problem includes hidden unit tests for automated verification. Agents use CLI tools (bash, write, read, edit) to develop and test their Python solution in a sandbox, then submit the file path for evaluation against hidden test cases.

Capabilities

  • Competitive programming problem solving
  • Algorithm design and implementation
  • Handling edge cases and constraints
  • Producing correct I/O format from problem descriptions

Compute Requirements

Submitted code is executed in a sandbox with 0.5 CPUs and 1 GB of RAM. Each test case has a 10-second time limit.

License

CC-BY-SA-4.0.

Tasks

There is one split: train (16,083 tasks). Each task presents a competitive programming problem. Problems are sourced from CodeContests (DeepMind) and Codeforces (Open-R1). Test cases per problem range from 1 to 430, with an average of ~59.

Reward Structure

This is a sparse reward environment with continuous scoring. The agent calls the submit tool once with a file path to its Python solution. The environment executes the code in a sandbox against hidden unit tests. The reward is the fraction of test cases passed:

Reward=passed test casestotal test cases\text{Reward} = \frac{\text{passed test cases}}{\text{total test cases}}

Scores range from 0.0 to 1.0. We do not use LLM graders for this task. Verification is purely test-case-based.

Data

Problems are sourced from the Nemotron-RL-coding-competitive_coding dataset by NVIDIA, which is part of the NeMo Gym framework for reinforcement learning on LLMs. The original problems come from CodeContests and Codeforces. Data files are stored on the OpenReward platform.

Tools

CLI tools (inherited from CLIEnvironment):

  • bash: Execute bash commands in the sandbox
  • write: Write content to a file
  • read: Read file contents
  • edit: Perform exact string replacement in a file
  • multi_edit: Perform multiple edits on a single file
  • glob: Find files matching a glob pattern
  • grep: Search for patterns in files
  • ls: List files and directories
  • todo_write: Manage a todo list for task planning

Evaluation tool:

  • submit: Submit a Python solution file for evaluation against hidden test cases. Takes a file_path parameter pointing to the solution. The code is executed in the sandbox against hidden test cases. Returns the number of test cases passed and the reward. This tool can only be called once per task.

Time Horizon

Nemotron Competitive Coding is a multi-step environment. The agent receives a problem statement, develops and tests a solution using CLI tools, then submits the file path for evaluation.

Other Environment Requirements

Nemotron Competitive Coding requires an OpenReward API key (api_key secret) for sandbox execution. No OpenAI API key is needed.

Safety

Agents are asked to write Python solutions to competitive programming problems. Submitted code is executed in a sandboxed environment with network access blocked. The environment does not present direct safety risks.

Citations

@dataset{nvidia_nemotron_rl_coding,
  author    = {NVIDIA Corporation},
  title     = {Nemotron-RL-coding-competitive\_coding},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/nvidia/Nemotron-RL-coding-competitive_coding},
  license   = {CC-BY-SA-4.0}
}
Implementations

No implementations linked yet. Add one to showcase related work.

NVIDIA/Nemotron-RL-coding-competitive_coding | OpenReward