apex-agents

Name: arXiv/apex-agents
Author: arXiv

arXiv/apex-agents

Description

AI Productivity Index for Agents (APEX-Agents) is a benchmark for assessing whether AI agents can execute long-horizon, cross-application tasks created by investment banking analysts, management consultants, and corporate lawyers, requiring agents to navigate realistic work environments with files and tools. It comprises 480 open-sourced tasks (prompts, rubrics, gold outputs, files, and metadata) and uses the Archipelago infrastructure for agent execution and evaluation.

arXiv

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
GeneralReasoning/APEX-Agents	0	3 months ago