SWE-BenchPro

Name: ScaleAI/SWE-BenchPro
Author: ScaleAI

ScaleAI/SWE-BenchPro

Description

SWE-Bench Pro is a contamination‑resistant benchmark for evaluating autonomous software engineering agents on realistic, long‑horizon, enterprise‑level development tasks that build on SWE‑BENCH best practices. It comprises 1,865 human‑verified problems from 41 actively maintained repositories—partitioned into public (11), held‑out (12) and commercial (18) sets—and features multi‑file, hours‑to‑days patches with contextual augmentation and clustered failure‑mode analysis.

arXiv

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
GeneralReasoning/SWE-Bench-Pro	0	3 months ago