Nemotron-RL-coding-competitive_coding
Nemotron Competitive Coding
Description
Nemotron Competitive Coding is an environment for evaluating agents on competitive programming problems. It wraps the Nemotron-RL-coding-competitive_coding dataset from NVIDIA, consisting of 16,083 competitive coding problems in Python sourced from CodeContests (DeepMind) and Codeforces (Open-R1). Each problem includes hidden unit tests for automated verification. Agents use CLI tools (bash, write, read, edit) to develop and test their Python solution in a sandbox, then submit the file path for evaluation against hidden test cases.
Capabilities
- Competitive programming problem solving
- Algorithm design and implementation
- Handling edge cases and constraints
- Producing correct I/O format from problem descriptions
Compute Requirements
Submitted code is executed in a sandbox with 0.5 CPUs and 1 GB of RAM. Each test case has a 10-second time limit.
License
Tasks
There is one split: train (16,083 tasks). Each task presents a competitive programming problem. Problems are sourced from CodeContests (DeepMind) and Codeforces (Open-R1). Test cases per problem range from 1 to 430, with an average of ~59.
Reward Structure
This is a sparse reward environment with continuous scoring. The agent calls the submit tool once with a file path to its Python solution. The environment executes the code in a sandbox against hidden unit tests. The reward is the fraction of test cases passed:
Scores range from 0.0 to 1.0. We do not use LLM graders for this task. Verification is purely test-case-based.
Data
Problems are sourced from the Nemotron-RL-coding-competitive_coding dataset by NVIDIA, which is part of the NeMo Gym framework for reinforcement learning on LLMs. The original problems come from CodeContests and Codeforces. Data files are stored on the OpenReward platform.
Tools
CLI tools (inherited from CLIEnvironment):
- bash: Execute bash commands in the sandbox
- write: Write content to a file
- read: Read file contents
- edit: Perform exact string replacement in a file
- multi_edit: Perform multiple edits on a single file
- glob: Find files matching a glob pattern
- grep: Search for patterns in files
- ls: List files and directories
- todo_write: Manage a todo list for task planning
Evaluation tool:
- submit: Submit a Python solution file for evaluation against hidden test cases. Takes a
file_pathparameter pointing to the solution. The code is executed in the sandbox against hidden test cases. Returns the number of test cases passed and the reward. This tool can only be called once per task.
Time Horizon
Nemotron Competitive Coding is a multi-step environment. The agent receives a problem statement, develops and tests a solution using CLI tools, then submits the file path for evaluation.
Other Environment Requirements
Nemotron Competitive Coding requires an OpenReward API key (api_key secret) for sandbox execution. No OpenAI API key is needed.
Safety
Agents are asked to write Python solutions to competitive programming problems. Submitted code is executed in a sandboxed environment with network access blocked. The environment does not present direct safety risks.
Citations
@dataset{nvidia_nemotron_rl_coding,
author = {NVIDIA Corporation},
title = {Nemotron-RL-coding-competitive\_coding},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/nvidia/Nemotron-RL-coding-competitive_coding},
license = {CC-BY-SA-4.0}
}No implementations linked yet. Add one to showcase related work.