SWE-BenchPro
Description
SWE-Bench Pro is a contamination‑resistant benchmark for evaluating autonomous software engineering agents on realistic, long‑horizon, enterprise‑level development tasks that build on SWE‑BENCH best practices. It comprises 1,865 human‑verified problems from 41 actively maintained repositories—partitioned into public (11), held‑out (12) and commercial (18) sets—and features multi‑file, hours‑to‑days patches with contextual augmentation and clustered failure‑mode analysis.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |