Nemotron-RL-Agentic-Function-Calling-Pivot-v1

Name: NVIDIA/Nemotron-RL-Agentic-Function-Calling-Pivot-v1
Author: NVIDIA

Description

Nemotron-RL-Agentic-Function-Calling-Pivot-v1 is an environment for evaluating agents on function-calling decision-making. It is based on the Nemotron-RL-Agentic-Function-Calling-Pivot-v1 dataset from NVIDIA, released as part of the NeMo Gym framework. The dataset poses each assistant step of an expert tool-use trajectory as a separate behavior cloning problem: the agent sees the conversation history and available tools, then must predict the correct next action -- either calling a specific function with the right arguments, or responding with a message.

Capabilities

Deciding when to call a tool vs. respond with a message
Selecting the correct function from a set of available tools
Generating correct function arguments as JSON
Multi-turn conversation comprehension
Reasoning about tool capabilities relative to user requests

Compute Requirements

Nemotron-FC-Pivot does not require a sandbox. It has minimal compute requirements.

License

CC-BY-4.0.

Tasks

There is one split with 9,620 tasks:

train (9,620 tasks): Function-calling pivot points extracted from expert tool-use trajectories. Each task presents a conversation context and asks the agent to predict the correct next action.

Reward Structure

This is a sparse, binary reward environment matching the NeMo Gym ground truth verification. The agent makes a single submission per task:

Function call tasks: Binary reward (0 or 1). The function name must match exactly. Arguments are compared recursively: dict keys must match, list lengths must match, floats use 1e-6 tolerance, short strings require exact match, longer strings use Jaccard word-count similarity (threshold 0.1). All must pass for reward 1.0.
Message tasks: Binary reward. Any chat message when a message was expected yields reward 1.0.
Wrong action type: Calling a function when a message was expected (or vice versa) yields reward 0.0.

Data

Decision points are sourced from the Nemotron-RL-Agentic-Function-Calling-Pivot-v1 dataset by NVIDIA. The original dataset uses OpenAI Responses API format with expert trajectories. The download_data.py script downloads and normalises the data to a flat parquet format for efficient serving.

Tools

This environment uses task-specific tools. Each task dynamically exposes the actual tools from the dataset (e.g., get_balance_sheet, get_earnings, generateImageUrl) via list_task_tools(). The agent interacts with these tools through native function calling.

In addition, there is one shared tool:

submit_message: Submit a text message response. Use when no function call is appropriate and the agent should respond directly to the user.

Time Horizon

Nemotron-FC-Pivot is a single-turn environment. The agent receives a conversation context and submits one action. Each task requires exactly one tool call.

Other Environment Requirements

Nemotron-FC-Pivot does not require any API keys or secrets. All grading is rule-based.

Safety

Agents in Nemotron-FC-Pivot are asked to predict the next action in a synthetic conversation. The environment does not present direct safety risks, as agents only submit predictions with no access to external systems, real tools, or the internet.

Citations

@dataset{nvidia_nemotron_fc_pivot_v1,
  author    = {NVIDIA Corporation},
  title     = {Nemotron-RL-Agentic-Function-Calling-Pivot-v1},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/nvidia/Nemotron-RL-Agentic-Function-Calling-Pivot-v1},
  license   = {CC-BY-4.0}
}

Implementations

No implementations linked yet. Add one to showcase related work.

Repository

Source repository

EnvCommons/Nemotron-RL-Agentic-Function-Calling-Pivot-v1

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152