deveval
Description
DevEval is a benchmark for evaluating the coding abilities of large language models on realistic, real-world software repositories. It comprises 1,874 test samples from 117 repositories across 10 popular domains, is annotated by 13 developers with comprehensive metadata (requirements, original repositories, reference code, and reference dependencies), and is designed for repository-level code generation evaluation.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |