Nemotron-Agentic-v1

API Endpoint
Leaderboard
Loading leaderboard...
README

Nemotron-Agentic

OpenReward Environment Hugging Face Dataset

Description

Nemotron-Agentic is an environment for evaluating agents on agentic tool-use decision-making. It is based on the Nemotron-Agentic-v1 dataset from NVIDIA, consisting of 335,122 multi-turn conversations with tool use. Each assistant turn in a conversation is extracted as a decision point: the agent sees the conversation history so far and the available tools, then must predict the correct next action -- either calling a specific function or responding with a message.

Capabilities

  • Deciding when to call a tool vs. respond with a message
  • Selecting the correct function from a set of available tools
  • Generating correct function arguments as JSON
  • Multi-turn conversation comprehension
  • Reasoning about tool capabilities relative to user requests

Compute Requirements

Nemotron-Agentic does not require a sandbox. It has minimal compute requirements.

License

CC-BY-4.0.

Tasks

There are two splits with a total of 1,197,894 tasks:

  • tool_calling (1,127,100 tasks): General-purpose tool-calling scenarios with simulated multi-turn conversations. 50.7% function call / 49.3% message.
  • interactive_agent (70,794 tasks): Synthetic multi-turn agentic trajectories for conversational tool use. 42.4% function call / 57.6% message.

Each task presents the conversation history up to a specific assistant turn and asks the agent to predict the correct next action.

Reward Structure

This is a sparse reward environment with continuous scoring. The agent makes a single submission per task:

  • Function call tasks: Reward = 0.5 * (name match) + 0.5 * (argument match). Name match is binary (0 or 1). Argument match is the fraction of key-value pairs that match between expected and submitted arguments.
  • Message tasks: Reward is computed via LLM grading (gpt-5-mini) if an OpenAI API key is provided, or via keyword overlap fallback otherwise. Scores range from 0.0 to 1.0.
  • Wrong action type: Calling a function when a message was expected (or vice versa) yields reward 0.0.

Data

Decision points are extracted from the Nemotron-Agentic-v1 dataset by NVIDIA. The original dataset contains 335,122 multi-turn conversations with tool use in JSONL format. The download_data.py script processes each conversation to extract every assistant turn as a separate decision point, yielding ~1.2M tasks. Each pivot captures the conversation context (all messages before the assistant turn) and the expected action (function call or message).

Tools

This environment uses task-specific tools. Each task dynamically exposes the actual tools from the dataset (e.g., get_weather, search_flights, calculate_tip) via list_task_tools(). The agent interacts with these tools through native function calling.

In addition, there is one shared tool:

  • submit_message: Submit a text message response. Use when no function call is appropriate and the agent should respond directly to the user.

Time Horizon

Nemotron-Agentic is a single-turn environment. The agent receives a conversation context and submits one action. Each task requires exactly one tool call.

Other Environment Requirements

Nemotron-Agentic optionally accepts an OpenAI API key (openai_api_key secret) for LLM-based grading of message responses. Without it, a simple keyword-overlap fallback grader is used for message tasks. Function call tasks do not require an API key.

Safety

Agents in Nemotron-Agentic are asked to predict the next action in a synthetic conversation. The environment does not present direct safety risks, as agents only submit predictions with no access to external systems, real tools, or the internet.

Citations

@dataset{nvidia_nemotron_agentic_v1,
  author    = {NVIDIA Corporation},
  title     = {Nemotron-Agentic-Tool-Use-v1},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/nvidia/Nemotron-Agentic-v1},
  license   = {CC-BY-4.0}
}
Implementations

No implementations linked yet. Add one to showcase related work.

NVIDIA/Nemotron-Agentic-v1 | OpenReward