featurebench
Description
FeatureBench is a benchmark for evaluating agentic coding performance in end-to-end, feature-oriented software development using an execution-based evaluation protocol. It employs a scalable, test-driven method that automatically derives feature-level tasks by tracing from unit tests along dependency graphs to create executable environments spanning multiple commits and PRs (200 tasks, 3,825 environments from 24 repositories).
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 2 months ago |