swe-gym
Description
SWE-Gym is the first environment/benchmark for training and evaluating real-world software engineering (SWE) agents, comprising 2,438 real-world Python task instances where each instance includes a codebase with an executable runtime environment, unit tests, and a natural-language task specification. It is designed to train and evaluate language-model-based SWE agents, enable inference-time verification experiments, and support reproducible research by releasing the dataset, models, and agent trajectories.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
3 | 2 months ago |