Nemotron-RL-Agentic-Function-Calling-Pivot-v1
Nemotron-RL-Agentic-Function-Calling-Pivot-v1
Description
Nemotron-RL-Agentic-Function-Calling-Pivot-v1 is an environment for evaluating agents on function-calling decision-making. It is based on the Nemotron-RL-Agentic-Function-Calling-Pivot-v1 dataset from NVIDIA, released as part of the NeMo Gym framework. The dataset poses each assistant step of an expert tool-use trajectory as a separate behavior cloning problem: the agent sees the conversation history and available tools, then must predict the correct next action -- either calling a specific function with the right arguments, or responding with a message.
Capabilities
- Deciding when to call a tool vs. respond with a message
- Selecting the correct function from a set of available tools
- Generating correct function arguments as JSON
- Multi-turn conversation comprehension
- Reasoning about tool capabilities relative to user requests
Compute Requirements
Nemotron-FC-Pivot does not require a sandbox. It has minimal compute requirements.
License
Tasks
There is one split with 9,620 tasks:
- train (9,620 tasks): Function-calling pivot points extracted from expert tool-use trajectories. Each task presents a conversation context and asks the agent to predict the correct next action.
Reward Structure
This is a sparse, binary reward environment matching the NeMo Gym ground truth verification. The agent makes a single submission per task:
- Function call tasks: Binary reward (0 or 1). The function name must match exactly. Arguments are compared recursively: dict keys must match, list lengths must match, floats use 1e-6 tolerance, short strings require exact match, longer strings use Jaccard word-count similarity (threshold 0.1). All must pass for reward 1.0.
- Message tasks: Binary reward. Any chat message when a message was expected yields reward 1.0.
- Wrong action type: Calling a function when a message was expected (or vice versa) yields reward 0.0.
Data
Decision points are sourced from the Nemotron-RL-Agentic-Function-Calling-Pivot-v1 dataset by NVIDIA. The original dataset uses OpenAI Responses API format with expert trajectories. The download_data.py script downloads and normalises the data to a flat parquet format for efficient serving.
Tools
This environment uses task-specific tools. Each task dynamically exposes the actual tools from the dataset (e.g., get_balance_sheet, get_earnings, generateImageUrl) via list_task_tools(). The agent interacts with these tools through native function calling.
In addition, there is one shared tool:
submit_message: Submit a text message response. Use when no function call is appropriate and the agent should respond directly to the user.
Time Horizon
Nemotron-FC-Pivot is a single-turn environment. The agent receives a conversation context and submits one action. Each task requires exactly one tool call.
Other Environment Requirements
Nemotron-FC-Pivot does not require any API keys or secrets. All grading is rule-based.
Safety
Agents in Nemotron-FC-Pivot are asked to predict the next action in a synthetic conversation. The environment does not present direct safety risks, as agents only submit predictions with no access to external systems, real tools, or the internet.
Citations
@dataset{nvidia_nemotron_fc_pivot_v1,
author = {NVIDIA Corporation},
title = {Nemotron-RL-Agentic-Function-Calling-Pivot-v1},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/nvidia/Nemotron-RL-Agentic-Function-Calling-Pivot-v1},
license = {CC-BY-4.0}
}No implementations linked yet. Add one to showcase related work.