skillsbench
Description
SkillsBench is a benchmark of 86 tasks across 11 domains paired with curated Skills and deterministic verifiers for measuring whether Agent Skills improve LLM agents at inference time. Each task is evaluated under three conditions—no Skills, curated Skills, and self-generated Skills—across seven agent-model configurations (7,308 trajectories) to quantify pass-rate changes and compare curated versus self-authored Skills.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
13 | 1 months ago |