Aider-Polyglot

API Endpoint
Leaderboard
Loading leaderboard...
README

Aider-Polyglot

OpenReward Environment

Description

Aider-Polyglot is an environment for evaluating code generation and editing across multiple programming languages. Based on the Exercism polyglot benchmark used in Aider evaluations, agents implement solutions for programming exercises in Python, Go, Rust, JavaScript, Java, and C++. Tasks are evaluated by running the exercise's test suite.

Capabilities

  • Multi-language code generation (Python, Go, Rust, JavaScript, Java, C++)
  • Code editing with str_replace and insert tools
  • Test-driven development evaluation
  • File system operations via bash and view tools

Compute Requirements

Agents are given a sandbox with 4 CPU cores and 8GB RAM, with language-specific toolchains pre-installed.

Tasks

There are seven splits in this environment:

  • all: All tasks across all languages
  • python: Python exercises
  • go: Go exercises
  • rust: Rust exercises
  • javascript: JavaScript exercises
  • java: Java exercises
  • cpp: C++ exercises

Each task provides instructions from Exercism and stub files to implement.

Reward Structure

This is a sparse, verifiable reward environment. The agent calls answer to run tests:

  • 1.0: All tests pass
  • 0.0: One or more tests fail

Agents get up to 2 attempts per task. Tests run with a 3-minute timeout.

Data

Tasks are sourced from the Exercism polyglot benchmark. Exercise files include instructions, stub code, and test files. Task data is stored in the repository.

Tools

ToolDescription
answerRun test suite and submit solution
bashExecute bash commands in sandbox
viewView file contents with optional line range
str_replaceReplace text in files
insertInsert content at a line number
createCreate new files

Time Horizon

Multi-turn. Agents iterate on solutions using editing tools before calling answer to run tests.

Environment Difficulty

Exercism exercises range from beginner to advanced difficulty. Selected results from the Aider polyglot leaderboard:

ModelPolyglot ScoreCorrect Edits
GPT-5 (high)88.0%91.6%
GPT-5 (medium)86.7%88.4%
o3-pro (high)84.9%97.8%
Gemini 2.5 Pro Preview (32k think)83.1%99.6%
GPT-5 (low)81.3%86.7%

Other Environment Requirements

There are no further environment requirements; Aider-Polyglot works out of the box with the OpenReward endpoint.

Safety

Agents in Aider-Polyglot write and execute code in an isolated sandbox environment. The environment does not present direct safety risks.

Citation

@misc{gauthier2024aider,
  title={Aider Polyglot Benchmark},
  author={Gauthier, Paul},
  year={2024},
  url={https://aider.chat/docs/leaderboards/}
}
GeneralReasoning/Aider-Polyglot | OpenReward