EndlessTerminals

API Endpoint
Leaderboard
Loading leaderboard...
Implementation of
README

Endless Terminals

OpenReward Environment

Description

Endless Terminals is an environment for training and evaluating terminal agents on procedurally generated command-line tasks. The benchmark provides diverse terminal-use tasks spanning file operations, log management, data processing, scripting, and database operations. Tasks are designed for reinforcement learning with binary episode-level rewards.

This OpenReward implementation is based on the Endless Terminals repository. The original benchmark contains 3,255 tasks; this implementation includes 2,490 tasks.

Capabilities

  • File operations and management
  • Log analysis and processing
  • Data transformation and scripting
  • Database operations
  • System administration tasks
  • Multi-step terminal command execution

Compute Requirements

Agents are given a sandbox with 1 CPU and 2GB RAM. Each task runs in an isolated Docker container with task-specific files and tooling.

License

Apache 2.0

Tasks

There is one split in this environment:

  • train: 2,490 terminal-based tasks (subset of 3,255 in original benchmark)

Each task provides a containerized environment with specific files and objectives. Agents must execute terminal commands to transform the initial state into the expected final state.

Reward Structure

This is a sparse, verifiable reward environment. Rewards are computed when the agent submits their answer:

  • 1.0: All verification tests pass (final state matches expected)
  • 0.0: Any test fails

No LLM grader is used. Each task has pytest-based verification scripts (test_initial_state.py, test_final_state.py) that validate the container state.

Data

Task data is sourced from HuggingFace. Each task contains:

  • instruction.md: Task description and requirements
  • environment/Dockerfile: Task-specific container definition
  • environment/image_sha.txt: Docker image digest
  • tests/test_final_state.py: Pytest verification logic
  • tests/test.sh: Test execution wrapper

Tools

Agents have access to 5 tools:

  • bash: Execute bash commands in the container
  • view: View file contents or directory listings
  • str_replace: Replace unique strings in files
  • create_file: Create new files with specified content
  • submit_answer: Finalize task and run verification tests

Time Horizon

Endless Terminals is a multi-turn environment where agents iteratively execute commands, explore the file system, and modify state before submission.

[Statistics on average tool calls here]

Environment Difficulty

Results from the original paper (dev set performance):

ModelBefore RLAfter RL
Llama-3.2-3B4.0%18.2%
Qwen2.5-7B10.7%53.3%
Qwen3-8B-openthinker-sft42.6%59.0%

Gains transfer to human-curated benchmarks like TerminalBench 2.0.

Safety

Endless Terminals tasks are run in isolated Docker containers. Agents interact only with pre-defined task environments and cannot affect external systems or the host machine.

Citations

This environment implements the Endless Terminals benchmark. If you use this environment, please cite the original paper:

@article{gandhi2026endless,
  title     = {Endless Terminals: Scaling RL Environments for Terminal Agents},
  author    = {Gandhi, Kanishk and Garg, Shivam and Goodman, Noah D. and Papailiopoulos, Dimitris},
  journal   = {arXiv preprint arXiv:2601.16443},
  year      = {2026}
}
kanishk/EndlessTerminals | OpenReward