USACO

API Endpoint
Leaderboard
Loading leaderboard...
Implementation of
README

USACO

OpenReward Environment

Description

USACO is an environment for evaluating competitive programming ability on problems from the USA Computing Olympiad. Agents must read algorithmic problem statements, reason about the approach, write Python code, test it in a sandbox, and submit for evaluation against hidden test cases.

Capabilities

  • Algorithmic reasoning and problem solving
  • Code generation (Python)
  • Iterative debugging and testing via sandbox
  • Handling I/O-based problem formats (stdin/stdout)

Compute Requirements

Server-side evaluation runs Python subprocesses with 512MB memory limit and 30s per-test timeout. Agents are given a sandboxed Docker environment with a pre-built instance image for each task. Default sandbox size is 0.5 CPU and 1 GB RAM. No network access.

Tasks

Single test split with 520 problems across four difficulty tiers:

  • Bronze: Ad hoc reasoning, simulation, basic data structures
  • Silver: Binary search, DFS/BFS, greedy algorithms
  • Gold: Dynamic programming, shortest paths, advanced data structures
  • Platinum: Computational geometry, segment trees, advanced graph theory

Each problem includes a full problem statement with I/O format, constraints, and sample test cases. Hidden test suites have 11-24 test cases per problem.

Reward Structure

Binary reward: 1.0 if all test cases pass, 0.0 otherwise. This matches standard USACO all-or-nothing grading. No partial credit.

Data

Source: codegenning/usacobench_formatted on HuggingFace (~3.8GB). Preprocessed into per-problem test case files and a lightweight metadata JSONL.

Tools

  • bash — Execute commands in a Python sandbox for experimentation
  • submit — Submit Python code for server-side evaluation (terminal action, one attempt)

Time Horizon

Multi-turn. Agents typically use 5-20 bash tool calls to write and test code before submitting.

Environment Difficulty

Based on the USACOBench paper, GPT-4 achieves ~8.7% pass@1 zero-shot, improving to ~20.2% with self-reflection and retrieval augmentation.

Safety

Problems are purely algorithmic. Code execution is sandboxed with resource limits (memory, time). Network access is blocked in the sandbox.

Citations

@misc{shi2024language,
    title={Can Language Models Solve Olympiad Programming?},
    author={Quan Shi and Michael Tang and Karthik Narasimhan and Shunyu Yao},
    year={2024},
    eprint={2404.10952},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
GeneralReasoning/USACO | OpenReward