API Endpoint

Leaderboard

Loading leaderboard...

README

SWE-Gym-Lite

Description

SWE-Gym-Lite is an environment for evaluating agents on real-world software engineering tasks. Based on SWE-Gym, agents are given GitHub issues from popular Python repositories and must modify the codebase to resolve them. Each task includes a sandboxed repository with pre-installed dependencies and executable test verification.

This OpenReward implementation is ported from the Harbor Framework implementation originally made by tangken333.

Capabilities

Resolving real-world GitHub issues
Understanding and navigating large codebases
Writing and modifying Python code
Debugging and test-driven development

Compute Requirements

Agents are given a sandboxed Docker environment. Default sandbox size is 1 CPU and 2 GB RAM.

License

MIT.

Tasks

There is one split in this environment:

Test: 185 GitHub issues across 10 Python repositories

Tasks span the following repositories:

mypy (41 tasks): Python static type checker
moto (36 tasks): AWS service mocking library
dvc (29 tasks): Data version control
monai (27 tasks): Medical imaging deep learning
pydantic (19 tasks): Data validation library
conan (11 tasks): C/C++ package manager
dask (10 tasks): Parallel computing library
hydra (9 tasks): Configuration framework
pandas (2 tasks): Data analysis library
bokeh (1 task): Interactive visualization library

Each task presents a GitHub issue with a bug report or feature request. The agent must understand the issue, locate the relevant code, and submit a patch that passes the repository's test suite.

Reward Structure

This is a multi-turn environment with binary reward:

1.0 — All relevant tests pass after applying the agent's patch
0.0 — Tests fail or patch cannot be applied

Verification follows the SWE-Bench evaluation protocol. The test harness applies the agent's solution patch, runs the repository's test suite on the affected tests, and checks that:

Tests that were failing before the fix now pass (FAIL_TO_PASS)
Tests that were passing before remain passing (PASS_TO_PASS)

Data

Data consists of 185 task directories, each containing an instruction file describing the GitHub issue, solution files for oracle verification, and a test harness. Tasks are derived from the SWE-Gym Lite split.

Tools

Tool	Description
`bash`	Run bash commands in the sandbox container.
`str_replace`	Replace a unique string in a file with another string.
`view`	View file contents or directory listings.
`create_file`	Create a new file with specified content.
`submit_answer`	Submit work for verification. Runs the test harness and returns reward.

Time Horizon

SWE-Gym-Lite is a multi-turn environment. Agents explore the repository, understand the bug, locate relevant code, implement a fix, and verify with tests before submitting.

Environment Difficulty

The SWE-Gym paper (ICML 2025) evaluates agents on SWE-Bench Verified and Lite:

Agent	SWE-Bench Verified	SWE-Bench Lite
SWE-Gym Fine-tuned (32B)	32.0%	26.0%
GPT-4o (OpenHands)	21.8%	18.4%
Claude 3.5 Sonnet (OpenHands)	30.2%	24.8%

Fine-tuning on SWE-Gym trajectories yields up to +14% absolute gains over base agent performance.

Other Environment Requirements

There are no external API key requirements; SWE-Gym-Lite works out of the box with the OpenReward endpoint.

Safety

Agents in SWE-Gym-Lite modify code within isolated Docker containers. The environment does not involve production systems or external network access beyond the sandbox.

Citations

@inproceedings{pan2025swegym,
  author    = {Jiayi Pan and Xingyao Wang and Graham Neubig and Navdeep Jaitly and Heng Ji and Alane Suhr and Yizhe Zhang},
  title     = {Training Software Engineering Agents and Verifiers with SWE-Gym},
  booktitle = {Proceedings of the International Conference on Machine Learning (ICML)},
  year      = {2025},
  url       = {https://arxiv.org/abs/2412.21139}
}

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	1 vCPU / 2 GB RAM

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	$0.0000230
Total	$0.0000550

Examples

5-minute session$0.0165

1-hour session$0.1980

swe-gym-lite

GeneralReasoning/swe-gym-lite

SWE-Gym-Lite

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citations

Tools

Compute Configuration

Estimated Cost

Examples