Nemotron-RL-agent-workplace_assistant
Nemotron-RL-Workplace-Assistant
Description
Nemotron-RL-Workplace-Assistant is an agentic environment that evaluates whether a model can correctly execute workplace tasks using simulated business tools. Each task presents a natural language request (e.g., "reply to Carlos's last email about the task update") and the agent must invoke the correct sequence of tool calls with the correct arguments to fulfill it. The environment covers five workplace domains: email, calendar, project management, customer relationship management (CRM), and web analytics.
The tools are backed by real simulated backends using pandas DataFrames loaded from CSV data files. When the agent calls finish(), the resulting database state (across all five domain backends) is compared against the state produced by executing the ground truth tool calls. This faithfully reproduces NVIDIA's original state-based grading from NeMo Gym.
Capabilities
- Multi-step agentic tool use across 27 workplace tools
- Action planning: determining which tools to call and in what order
- Argument accuracy: providing correct IDs, field names, values, and free-text content
- Five workplace domains: email, calendar, project management, CRM, analytics
License
Tasks
| Split | Tasks |
|---|---|
train | 1,255 |
validation | 545 |
Tasks are distributed across five categories:
| Category | Description |
|---|---|
workplace_assistant_email | Send, reply, forward, delete, search emails |
workplace_assistant_calendar | Create, update, delete, search calendar events |
workplace_assistant_project_management | Create, update, delete, search project tasks |
workplace_assistant_customer_relationship_manager | Add, update, delete, search CRM customers |
workplace_assistant_analytics | Query visit counts, session durations, create plots |
Ground truth call counts per task range from 0 to 8, with the majority being single-call tasks.
Reward Structure
Reward is binary (0.0 or 1.0), determined by state-based comparison:
- The agent's recorded action tool calls are executed against fresh tool backends (pandas DataFrames loaded from CSV).
- The ground truth tool calls are executed against separate fresh tool backends.
- The resulting DataFrame states (email, calendar, analytics plots, project tasks, CRM) are compared using
DataFrame.equals()after case normalization. - If all five domain states match, reward = 1.0. Otherwise, reward = 0.0.
Read-only / information-gathering tool calls do not affect grading state, so the agent is free to explore before acting.
Data
Data is sourced from nvidia/Nemotron-RL-agent-workplace_assistant on HuggingFace. CSV tool backend data is sourced from NVIDIA's NeMo Gym repository. The dataset is stored on the OpenReward platform.
Tools
| Tool | Type | Description |
|---|---|---|
company_directory_find_email_address | Read | Find email addresses by name |
email_get_email_information_by_id | Read | Get email details by ID |
email_search_emails | Read | Search emails by query, date range |
email_send_email | Action | Send a new email |
email_delete_email | Action | Delete an email |
email_forward_email | Action | Forward an email |
email_reply_email | Action | Reply to an email |
calendar_get_event_information_by_id | Read | Get calendar event details |
calendar_search_events | Read | Search calendar events |
calendar_create_event | Action | Create a calendar event |
calendar_delete_event | Action | Delete a calendar event |
calendar_update_event | Action | Update a calendar event field |
analytics_get_visitor_information_by_id | Read | Get visitor analytics info |
analytics_create_plot | Action | Create an analytics plot |
analytics_total_visits_count | Read | Get total visits for date range |
analytics_engaged_users_count | Read | Get engaged users for date range |
analytics_traffic_source_count | Read | Get traffic source counts |
analytics_get_average_session_duration | Read | Get average session duration |
project_management_get_task_information_by_id | Read | Get project task details |
project_management_search_tasks | Read | Search project tasks |
project_management_create_task | Action | Create a project task |
project_management_delete_task | Action | Delete a project task |
project_management_update_task | Action | Update a project task field |
customer_relationship_manager_search_customers | Read | Search CRM customers |
customer_relationship_manager_update_customer | Action | Update a CRM customer field |
customer_relationship_manager_add_customer | Action | Add a new CRM customer |
customer_relationship_manager_delete_customer | Action | Delete a CRM customer |
finish | Control | Signal task completion and trigger grading |
Time Horizon
Multi-turn agentic environment. The agent may call information-gathering tools before taking actions, then calls finish to end the episode.
Safety
This environment uses simulated workplace tools that do not connect to real services. There are no direct safety risks.
Citations
@misc{nvidia_nemotron_rl_workplace_assistant,
title={Nemotron-RL-agent-workplace_assistant},
author={NVIDIA},
year={2026},
url={https://huggingface.co/datasets/nvidia/Nemotron-RL-agent-workplace_assistant}
}No implementations linked yet. Add one to showcase related work.