nl2repobench
Description
NL2Repo Bench (Natural Language to Repository Benchmark) is a benchmark for evaluating the long-horizon repository-generation ability of coding agents, measuring their capacity to sustain coherent reasoning, planning, and execution across extended interaction horizons. Given only a single natural-language requirements document and an empty workspace, agents must autonomously design the architecture, manage dependencies, implement multi-module logic, and produce a fully installable Python library.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 2 months ago |