API Endpoint

Leaderboard

Loading leaderboard...

README

Code Contests

Description

Code Contests is an environment for evaluating AI agents on competitive programming problems. Based on DeepMind's CodeContests dataset, agents must read problem statements, write Python solutions, and pass all test cases. Problems are sourced from platforms including Codeforces, AtCoder, CodeChef, and HackerEarth.

Capabilities

Algorithmic problem-solving
Data structure implementation
Mathematical reasoning and optimization
Input/output parsing and formatting
Code debugging and refinement
Competitive programming techniques

Compute Requirements

Agents are given a sandbox with 1 CPU and 2GB RAM. Each task runs in an isolated Docker container. Agent timeout is 900 seconds (15 minutes) and verifier timeout is 720 seconds (12 minutes).

License

MIT

Tasks

There is one split in this environment:

train: 9,644 competitive programming problems

Problems range from simple bracket matching to complex algorithmic challenges involving graph theory, dynamic programming, number theory, and combinatorics.

Reward Structure

This is a sparse, verifiable reward environment. Rewards are computed when the agent submits their answer:

1.0: All test cases pass
0.0: Any test case fails

No LLM grader is used. Solutions are validated against multiple input/output test cases using pytest. No partial credit is given.

Data

Each task contains:

instruction.md: Problem statement with constraints and examples
task.toml: Metadata (difficulty, tags, timeouts)
tests/test_data.json: Input/output test cases
tests/test_state.py: Pytest-based test harness
tests/test.sh: Test execution script

Agents must write their solution to /app/solution.py which reads from stdin and writes to stdout.

Tools

Agents have access to 5 tools:

bash: Execute bash commands in the container
view: View file contents or directory listings
str_replace: Replace strings in files
create_file: Create new files with specified content
submit_answer: Submit solution and run test cases

Time Horizon

Code Contests is a multi-turn environment where agents read the problem, develop a solution iteratively, test with example cases, and submit for final evaluation.

[Statistics on average tool calls here]

Environment Difficulty

The original AlphaCode paper reports that with up to a million samples per problem, AlphaCode solved 34.2% of problems. On Codeforces competitions, AlphaCode ranked within the top 54.3% of participants.

Safety

Code Contests tasks are run in isolated Docker containers. The environment focuses on algorithmic problem-solving and does not involve external network access or system-level operations.

Citations

This environment uses DeepMind's CodeContests dataset. If you use this environment, please cite the original paper:

@article{Li_2022,
   title={Competition-level code generation with AlphaCode},
   volume={378},
   ISSN={1095-9203},
   url={http://dx.doi.org/10.1126/science.abq1158},
   DOI={10.1126/science.abq1158},
   number={6624},
   journal={Science},
   publisher={American Association for the Advancement of Science (AAAS)},
   author={Li, Yujia and Choi, David and Chung, Junyoung and Kushman, Nate and Schrittwieser, Julian and Leblond, Rémi and Eccles, Tom and Keeling, James and Gimeno, Felix and Dal Lago, Agustin and Hubert, Thomas and Choy, Peter and de Masson d’Autume, Cyprien and Babuschkin, Igor and Chen, Xinyun and Huang, Po-Sen and Welbl, Johannes and Gowal, Sven and Cherepanov, Alexey and Molloy, James and Mankowitz, Daniel J. and Sutherland Robson, Esme and Kohli, Pushmeet and de Freitas, Nando and Kavukcuoglu, Koray and Vinyals, Oriol},
   year={2022},
   month=dec, pages={1092–1097} }

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

code-contests

GeneralReasoning/code-contests

Code Contests

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Safety

Citations

Tools

Compute Configuration

Estimated Cost

Examples