apex-agents
Description
AI Productivity Index for Agents (APEX-Agents) is a benchmark for assessing whether AI agents can execute long-horizon, cross-application tasks created by investment banking analysts, management consultants, and corporate lawyers, requiring agents to navigate realistic work environments with files and tools. It comprises 480 open-sourced tasks (prompts, rubrics, gold outputs, files, and metadata) and uses the Archipelago infrastructure for agent execution and evaluation.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |