API Endpoint

Leaderboard

Loading leaderboard...

Implementation of

arXiv/usaco

README

USACO

Description

USACO is an environment for evaluating competitive programming ability on problems from the USA Computing Olympiad. Agents must read algorithmic problem statements, reason about the approach, write Python code, test it in a sandbox, and submit for evaluation against hidden test cases.

Capabilities

Algorithmic reasoning and problem solving
Code generation (Python)
Iterative debugging and testing via sandbox
Handling I/O-based problem formats (stdin/stdout)

Compute Requirements

Server-side evaluation runs Python subprocesses with 512MB memory limit and 30s per-test timeout. Agents are given a sandboxed Docker environment with a pre-built instance image for each task. Default sandbox size is 0.5 CPU and 1 GB RAM. No network access.

Tasks

Single test split with 520 problems across four difficulty tiers:

Bronze: Ad hoc reasoning, simulation, basic data structures
Silver: Binary search, DFS/BFS, greedy algorithms
Gold: Dynamic programming, shortest paths, advanced data structures
Platinum: Computational geometry, segment trees, advanced graph theory

Each problem includes a full problem statement with I/O format, constraints, and sample test cases. Hidden test suites have 11-24 test cases per problem.

Reward Structure

Binary reward: 1.0 if all test cases pass, 0.0 otherwise. This matches standard USACO all-or-nothing grading. No partial credit.

Data

Source: codegenning/usacobench_formatted on HuggingFace (~3.8GB). Preprocessed into per-problem test case files and a lightweight metadata JSONL.

Tools

bash — Execute commands in a Python sandbox for experimentation
submit — Submit Python code for server-side evaluation (terminal action, one attempt)

Time Horizon

Multi-turn. Agents typically use 5-20 bash tool calls to write and test code before submitting.

Environment Difficulty

Based on the USACOBench paper, GPT-4 achieves ~8.7% pass@1 zero-shot, improving to ~20.2% with self-reflection and retrieval augmentation.

Safety

Problems are purely algorithmic. Code execution is sandboxed with resource limits (memory, time). Network access is blocked in the sandbox.

Citations

@misc{shi2024language,
    title={Can Language Models Solve Olympiad Programming?},
    author={Quan Shi and Michael Tang and Karthik Narasimhan and Shunyu Yao},
    year={2024},
    eprint={2404.10952},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Repository

Source repository

EnvCommons/USACO

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	2 vCPUs / 8 GB RAM
Sandbox Machine	0.5 vCPUs / 1 GB RAM

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000640
Sandbox	$0.0000115
Total	$0.0000755

Examples

5-minute session$0.0226

1-hour session$0.2718

USACO

GeneralReasoning/USACO

USACO

Description

Capabilities

Compute Requirements

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Safety

Citations

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples