API Endpoint

Leaderboard

Loading leaderboard...

README

Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1

Name: NVIDIA/Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1
Author: NVIDIA

Description

An environment for evaluating agents on agentic tool-use decision-making in customer service scenarios. Based on the Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1 dataset from NVIDIA. Each task presents a multi-turn conversation with available tools at a specific decision point: the agent must predict the correct next action -- either calling a specific function with the right arguments, or responding with a message.

Capabilities

Deciding when to call a tool vs. respond with a message
Selecting the correct function from a set of available tools
Generating correct function arguments as JSON
Multi-turn conversation comprehension
Following domain-specific policies and constraints
Reasoning about tool capabilities relative to user requests

Compute Requirements

This environment does not require a sandbox. It has minimal compute requirements.

License

CC-BY-4.0.

Tasks

There is one split with 96,968 tasks:

train (96,968 tasks): Pre-extracted decision points from multi-turn customer service conversations. 67.6% function call / 32.4% message.

Domains include senior care services, sports merchandise ordering, test prep platforms, organic farming supplies, renewable energy installations, and more.

Reward Structure

This is a sparse reward environment with continuous scoring. The agent makes a single submission per task:

Function call tasks: Reward = 0.5 * (name match) + 0.5 * (argument match). Name match is binary (0 or 1). Argument match is the fraction of key-value pairs that match between expected and submitted arguments, using exact/substring string comparison. Note: ~20% of function call tasks are transfer/escalate functions with a single free-text summary argument. String matching is effectively uninformative for these, so the reward acts as a binary signal on function name correctness (0.0 or 0.5).
Message tasks: Reward is computed via LLM grading (gpt-5-mini) if an OpenAI API key is provided, or via keyword overlap fallback otherwise. Scores range from 0.0 to 1.0.
Wrong action type: Calling a function when a message was expected (or vice versa) yields reward 0.0.

Data

Decision points are sourced from the Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1 dataset by NVIDIA. Each row is a pre-pivoted decision point containing the conversation context (all messages before the decision point), available tools, and the expected action (function call or message). The download_data.py script fetches the JSONL from HuggingFace and converts it to parquet format locally.

Tools

This environment uses task-specific tools. Each task dynamically exposes the actual tools from the dataset (e.g., authenticate_client, check_coverage, schedule_service) via list_task_tools(). The agent interacts with these tools through native function calling.

In addition, there is one shared tool:

submit_message: Submit a text message response. Use when no function call is appropriate and the agent should respond directly to the user.

Time Horizon

This is a single-turn environment. The agent receives a conversation context and submits one action. Each task requires exactly one tool call.

Other Environment Requirements

This environment optionally accepts an OpenAI API key (openai_api_key secret) for LLM-based grading of message responses. Without it, a simple keyword-overlap fallback grader is used for message tasks. Function call tasks do not require an API key.

Safety

Agents are asked to predict the next action in a synthetic conversation. The environment does not present direct safety risks, as agents only submit predictions with no access to external systems, real tools, or the internet.

Citations

@dataset{nvidia_nemotron_rl_agentic_pivot_v1,
  author    = {NVIDIA Corporation},
  title     = {Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/nvidia/Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1},
  license   = {CC-BY-4.0}
}

Implementations

No implementations linked yet. Add one to showcase related work.

Repository

Source repository

EnvCommons/Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152