API Endpoint

Leaderboard

Loading leaderboard...

README

Nemotron-Agentic

Name: NVIDIA/Nemotron-Agentic-v1
Author: NVIDIA

Description

Nemotron-Agentic is an environment for evaluating agents on agentic tool-use decision-making. It is based on the Nemotron-Agentic-v1 dataset from NVIDIA, consisting of 335,122 multi-turn conversations with tool use. Each assistant turn in a conversation is extracted as a decision point: the agent sees the conversation history so far and the available tools, then must predict the correct next action -- either calling a specific function or responding with a message.

Capabilities

Deciding when to call a tool vs. respond with a message
Selecting the correct function from a set of available tools
Generating correct function arguments as JSON
Multi-turn conversation comprehension
Reasoning about tool capabilities relative to user requests

Compute Requirements

Nemotron-Agentic does not require a sandbox. It has minimal compute requirements.

License

CC-BY-4.0.

Tasks

There are two splits with a total of 1,197,894 tasks:

tool_calling (1,127,100 tasks): General-purpose tool-calling scenarios with simulated multi-turn conversations. 50.7% function call / 49.3% message.
interactive_agent (70,794 tasks): Synthetic multi-turn agentic trajectories for conversational tool use. 42.4% function call / 57.6% message.

Each task presents the conversation history up to a specific assistant turn and asks the agent to predict the correct next action.

Reward Structure

This is a sparse reward environment with continuous scoring. The agent makes a single submission per task:

Function call tasks: Reward = 0.5 * (name match) + 0.5 * (argument match). Name match is binary (0 or 1). Argument match is the fraction of key-value pairs that match between expected and submitted arguments.
Message tasks: Reward is computed via LLM grading (gpt-5-mini) if an OpenAI API key is provided, or via keyword overlap fallback otherwise. Scores range from 0.0 to 1.0.
Wrong action type: Calling a function when a message was expected (or vice versa) yields reward 0.0.

Data

Decision points are extracted from the Nemotron-Agentic-v1 dataset by NVIDIA. The original dataset contains 335,122 multi-turn conversations with tool use in JSONL format. The download_data.py script processes each conversation to extract every assistant turn as a separate decision point, yielding ~1.2M tasks. Each pivot captures the conversation context (all messages before the assistant turn) and the expected action (function call or message).

Tools

This environment uses task-specific tools. Each task dynamically exposes the actual tools from the dataset (e.g., get_weather, search_flights, calculate_tip) via list_task_tools(). The agent interacts with these tools through native function calling.

In addition, there is one shared tool:

submit_message: Submit a text message response. Use when no function call is appropriate and the agent should respond directly to the user.

Time Horizon

Nemotron-Agentic is a single-turn environment. The agent receives a conversation context and submits one action. Each task requires exactly one tool call.

Other Environment Requirements

Nemotron-Agentic optionally accepts an OpenAI API key (openai_api_key secret) for LLM-based grading of message responses. Without it, a simple keyword-overlap fallback grader is used for message tasks. Function call tasks do not require an API key.

Safety

Agents in Nemotron-Agentic are asked to predict the next action in a synthetic conversation. The environment does not present direct safety risks, as agents only submit predictions with no access to external systems, real tools, or the internet.

Citations

@dataset{nvidia_nemotron_agentic_v1,
  author    = {NVIDIA Corporation},
  title     = {Nemotron-Agentic-Tool-Use-v1},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/nvidia/Nemotron-Agentic-v1},
  license   = {CC-BY-4.0}
}

Implementations

No implementations linked yet. Add one to showcase related work.

Repository

Source repository

EnvCommons/nemotron-agentic

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152