code-contests

API Endpoint
Leaderboard
Loading leaderboard...
README

Code Contests

OpenReward Environment

Description

Code Contests is an environment for evaluating AI agents on competitive programming problems. Based on DeepMind's CodeContests dataset, agents must read problem statements, write Python solutions, and pass all test cases. Problems are sourced from platforms including Codeforces, AtCoder, CodeChef, and HackerEarth.

Capabilities

  • Algorithmic problem-solving
  • Data structure implementation
  • Mathematical reasoning and optimization
  • Input/output parsing and formatting
  • Code debugging and refinement
  • Competitive programming techniques

Compute Requirements

Agents are given a sandbox with 1 CPU and 2GB RAM. Each task runs in an isolated Docker container. Agent timeout is 900 seconds (15 minutes) and verifier timeout is 720 seconds (12 minutes).

License

MIT

Tasks

There is one split in this environment:

  • train: 9,644 competitive programming problems

Problems range from simple bracket matching to complex algorithmic challenges involving graph theory, dynamic programming, number theory, and combinatorics.

Reward Structure

This is a sparse, verifiable reward environment. Rewards are computed when the agent submits their answer:

  • 1.0: All test cases pass
  • 0.0: Any test case fails

No LLM grader is used. Solutions are validated against multiple input/output test cases using pytest. No partial credit is given.

Data

Each task contains:

  • instruction.md: Problem statement with constraints and examples
  • task.toml: Metadata (difficulty, tags, timeouts)
  • tests/test_data.json: Input/output test cases
  • tests/test_state.py: Pytest-based test harness
  • tests/test.sh: Test execution script

Agents must write their solution to /app/solution.py which reads from stdin and writes to stdout.

Tools

Agents have access to 5 tools:

  • bash: Execute bash commands in the container
  • view: View file contents or directory listings
  • str_replace: Replace strings in files
  • create_file: Create new files with specified content
  • submit_answer: Submit solution and run test cases

Time Horizon

Code Contests is a multi-turn environment where agents read the problem, develop a solution iteratively, test with example cases, and submit for final evaluation.

[Statistics on average tool calls here]

Environment Difficulty

The original AlphaCode paper reports that with up to a million samples per problem, AlphaCode solved 34.2% of problems. On Codeforces competitions, AlphaCode ranked within the top 54.3% of participants.

Safety

Code Contests tasks are run in isolated Docker containers. The environment focuses on algorithmic problem-solving and does not involve external network access or system-level operations.

Citations

This environment uses DeepMind's CodeContests dataset. If you use this environment, please cite the original paper:

@article{Li_2022,
   title={Competition-level code generation with AlphaCode},
   volume={378},
   ISSN={1095-9203},
   url={http://dx.doi.org/10.1126/science.abq1158},
   DOI={10.1126/science.abq1158},
   number={6624},
   journal={Science},
   publisher={American Association for the Advancement of Science (AAAS)},
   author={Li, Yujia and Choi, David and Chung, Junyoung and Kushman, Nate and Schrittwieser, Julian and Leblond, Rémi and Eccles, Tom and Keeling, James and Gimeno, Felix and Dal Lago, Agustin and Hubert, Thomas and Choy, Peter and de Masson d’Autume, Cyprien and Babuschkin, Igor and Chen, Xinyun and Huang, Po-Sen and Welbl, Johannes and Gowal, Sven and Cherepanov, Alexey and Molloy, James and Mankowitz, Daniel J. and Sutherland Robson, Esme and Kohli, Pushmeet and de Freitas, Nando and Kavukcuoglu, Koray and Vinyals, Oriol},
   year={2022},
   month=dec, pages={1092–1097} }
GeneralReasoning/code-contests | OpenReward