featurebench

Description

FeatureBench is a benchmark for evaluating agentic coding performance in end-to-end, feature-oriented software development using an execution-based evaluation protocol. It employs a scalable, test-driven method that automatically derives feature-level tasks by tracing from unit tests along dependency graphs to create executable environments spanning multiple commits and PRs (200 tasks, 3,825 environments from 24 repositories).

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/FeatureBench
0
2 months ago
arXiv/featurebench | OpenReward