featurebench

Name: arXiv/featurebench
Author: arXiv

arXiv/featurebench

Description

FeatureBench is a benchmark for evaluating agentic coding performance in end-to-end, feature-oriented software development using an execution-based evaluation protocol. It employs a scalable, test-driven method that automatically derives feature-level tasks by tracing from unit tests along dependency graphs to create executable environments spanning multiple commits and PRs (200 tasks, 3,825 environments from 24 repositories).

arXiv GitHub HuggingFace

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
GeneralReasoning/FeatureBench	0	3 months ago