apex-agents

Description

AI Productivity Index for Agents (APEX-Agents) is a benchmark for assessing whether AI agents can execute long-horizon, cross-application tasks created by investment banking analysts, management consultants, and corporate lawyers, requiring agents to navigate realistic work environments with files and tools. It comprises 480 open-sourced tasks (prompts, rubrics, gold outputs, files, and metadata) and uses the Archipelago infrastructure for agent execution and evaluation.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/APEX-Agents
0
1 months ago
arXiv/apex-agents | OpenReward