Nemotron-RL-agent-workplace_assistant

API Endpoint
Leaderboard
Loading leaderboard...
README

Nemotron-RL-Workplace-Assistant

OpenReward Environment Hugging Face Dataset

Description

Nemotron-RL-Workplace-Assistant is an agentic environment that evaluates whether a model can correctly execute workplace tasks using simulated business tools. Each task presents a natural language request (e.g., "reply to Carlos's last email about the task update") and the agent must invoke the correct sequence of tool calls with the correct arguments to fulfill it. The environment covers five workplace domains: email, calendar, project management, customer relationship management (CRM), and web analytics.

The tools are backed by real simulated backends using pandas DataFrames loaded from CSV data files. When the agent calls finish(), the resulting database state (across all five domain backends) is compared against the state produced by executing the ground truth tool calls. This faithfully reproduces NVIDIA's original state-based grading from NeMo Gym.

Capabilities

  • Multi-step agentic tool use across 27 workplace tools
  • Action planning: determining which tools to call and in what order
  • Argument accuracy: providing correct IDs, field names, values, and free-text content
  • Five workplace domains: email, calendar, project management, CRM, analytics

License

CC-BY-4.0.

Tasks

SplitTasks
train1,255
validation545

Tasks are distributed across five categories:

CategoryDescription
workplace_assistant_emailSend, reply, forward, delete, search emails
workplace_assistant_calendarCreate, update, delete, search calendar events
workplace_assistant_project_managementCreate, update, delete, search project tasks
workplace_assistant_customer_relationship_managerAdd, update, delete, search CRM customers
workplace_assistant_analyticsQuery visit counts, session durations, create plots

Ground truth call counts per task range from 0 to 8, with the majority being single-call tasks.

Reward Structure

Reward is binary (0.0 or 1.0), determined by state-based comparison:

  1. The agent's recorded action tool calls are executed against fresh tool backends (pandas DataFrames loaded from CSV).
  2. The ground truth tool calls are executed against separate fresh tool backends.
  3. The resulting DataFrame states (email, calendar, analytics plots, project tasks, CRM) are compared using DataFrame.equals() after case normalization.
  4. If all five domain states match, reward = 1.0. Otherwise, reward = 0.0.

Read-only / information-gathering tool calls do not affect grading state, so the agent is free to explore before acting.

Data

Data is sourced from nvidia/Nemotron-RL-agent-workplace_assistant on HuggingFace. CSV tool backend data is sourced from NVIDIA's NeMo Gym repository. The dataset is stored on the OpenReward platform.

Tools

ToolTypeDescription
company_directory_find_email_addressReadFind email addresses by name
email_get_email_information_by_idReadGet email details by ID
email_search_emailsReadSearch emails by query, date range
email_send_emailActionSend a new email
email_delete_emailActionDelete an email
email_forward_emailActionForward an email
email_reply_emailActionReply to an email
calendar_get_event_information_by_idReadGet calendar event details
calendar_search_eventsReadSearch calendar events
calendar_create_eventActionCreate a calendar event
calendar_delete_eventActionDelete a calendar event
calendar_update_eventActionUpdate a calendar event field
analytics_get_visitor_information_by_idReadGet visitor analytics info
analytics_create_plotActionCreate an analytics plot
analytics_total_visits_countReadGet total visits for date range
analytics_engaged_users_countReadGet engaged users for date range
analytics_traffic_source_countReadGet traffic source counts
analytics_get_average_session_durationReadGet average session duration
project_management_get_task_information_by_idReadGet project task details
project_management_search_tasksReadSearch project tasks
project_management_create_taskActionCreate a project task
project_management_delete_taskActionDelete a project task
project_management_update_taskActionUpdate a project task field
customer_relationship_manager_search_customersReadSearch CRM customers
customer_relationship_manager_update_customerActionUpdate a CRM customer field
customer_relationship_manager_add_customerActionAdd a new CRM customer
customer_relationship_manager_delete_customerActionDelete a CRM customer
finishControlSignal task completion and trigger grading

Time Horizon

Multi-turn agentic environment. The agent may call information-gathering tools before taking actions, then calls finish to end the episode.

Safety

This environment uses simulated workplace tools that do not connect to real services. There are no direct safety risks.

Citations

@misc{nvidia_nemotron_rl_workplace_assistant,
  title={Nemotron-RL-agent-workplace_assistant},
  author={NVIDIA},
  year={2026},
  url={https://huggingface.co/datasets/nvidia/Nemotron-RL-agent-workplace_assistant}
}
Implementations

No implementations linked yet. Add one to showcase related work.

NVIDIA/Nemotron-RL-agent-workplace_assistant | OpenReward