API Endpoint

Leaderboard

Loading leaderboard...

README

Aider-Polyglot

Description

Aider-Polyglot is an environment for evaluating code generation and editing across multiple programming languages. Based on the Exercism polyglot benchmark used in Aider evaluations, agents implement solutions for programming exercises in Python, Go, Rust, JavaScript, Java, and C++. Tasks are evaluated by running the exercise's test suite.

Capabilities

Multi-language code generation (Python, Go, Rust, JavaScript, Java, C++)
Code editing with str_replace and insert tools
Test-driven development evaluation
File system operations via bash and view tools

Compute Requirements

Agents are given a sandbox with 4 CPU cores and 8GB RAM, with language-specific toolchains pre-installed.

Tasks

There are seven splits in this environment:

all: All tasks across all languages
python: Python exercises
go: Go exercises
rust: Rust exercises
javascript: JavaScript exercises
java: Java exercises
cpp: C++ exercises

Each task provides instructions from Exercism and stub files to implement.

Reward Structure

This is a sparse, verifiable reward environment. The agent calls answer to run tests:

1.0: All tests pass
0.0: One or more tests fail

Agents get up to 2 attempts per task. Tests run with a 3-minute timeout.

Data

Tasks are sourced from the Exercism polyglot benchmark. Exercise files include instructions, stub code, and test files. Task data is stored in the repository.

Tools

Tool	Description
`answer`	Run test suite and submit solution
`bash`	Execute bash commands in sandbox
`view`	View file contents with optional line range
`str_replace`	Replace text in files
`insert`	Insert content at a line number
`create`	Create new files

Time Horizon

Multi-turn. Agents iterate on solutions using editing tools before calling answer to run tests.

Environment Difficulty

Exercism exercises range from beginner to advanced difficulty. Selected results from the Aider polyglot leaderboard:

Model	Polyglot Score	Correct Edits
GPT-5 (high)	88.0%	91.6%
GPT-5 (medium)	86.7%	88.4%
o3-pro (high)	84.9%	97.8%
Gemini 2.5 Pro Preview (32k think)	83.1%	99.6%
GPT-5 (low)	81.3%	86.7%

Other Environment Requirements

There are no further environment requirements; Aider-Polyglot works out of the box with the OpenReward endpoint.

Safety

Agents in Aider-Polyglot write and execute code in an isolated sandbox environment. The environment does not present direct safety risks.

Citation

@misc{gauthier2024aider,
  title={Aider Polyglot Benchmark},
  author={Gauthier, Paul},
  year={2024},
  url={https://aider.chat/docs/leaderboards/}
}

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

Aider-Polyglot

GeneralReasoning/Aider-Polyglot

Aider-Polyglot

Description

Capabilities

Compute Requirements

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

Tools

Compute Configuration

Estimated Cost

Examples