swe-gym

Description

SWE-Gym is the first environment/benchmark for training and evaluating real-world software engineering (SWE) agents, comprising 2,438 real-world Python task instances where each instance includes a codebase with an executable runtime environment, unit tests, and a natural-language task specification. It is designed to train and evaluate language-model-based SWE agents, enable inference-time verification experiments, and support reproducible research by releasing the dataset, models, and agent trajectories.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
jiayipanjiayipan/SWE-Gym
3
2 months ago
arXiv/swe-gym | OpenReward