API Endpoint

Leaderboard

Loading leaderboard...

README

AnthropicPerformance

Description

AnthropicPerformance is an environment based on Anthropic's original performance engineering takehome challenge. Agents optimize a VLIW SIMD kernel implementation to minimize clock cycles for a tree traversal workload. The challenge involves instruction scheduling, vectorization, and low-level optimization in a simulated machine architecture.

Capabilities

VLIW SIMD instruction optimization
Kernel code generation and scheduling
Performance profiling and iterative improvement
Low-level systems programming

Compute Requirements

Agents are given a sandbox with 2 CPU cores and 2GB RAM.

License

MIT

Tasks

There is one split in this environment:

train: 1 task (perf-opt-v1)

The task starts with a baseline implementation at 18,532 cycles and challenges agents to optimize toward Claude model benchmarks.

Reward Structure

This is a dense, verifiable reward environment. Rewards use linear interpolation between Claude model baselines:

Baseline: 18,532 cycles (starting point)
0.0: Matched worst Claude baseline (2,164 cycles)
1.0: Matched best Claude baseline (1,363 cycles)
>1.0: Superhuman performance

Rewards are calculated as incremental improvements on each submission:

reward = (current_cycles - 2164) / (1363 - 2164)

Data

Source files are provided in the sandbox workspace:

problem.py: Machine simulator (VLIW SIMD architecture)
perf_takehome.py: Kernel builder (optimization target)

Test harness runs server-side and is never exposed to agents.

Tools

CLI Tools:

Tool	Description
`bash`	Execute bash commands
`read`	Read file contents
`write`	Write files
`edit`	Edit files
`glob`	Find files by pattern
`grep`	Search file contents
`ls`	List directory contents

Environment Tools:

Tool	Description
`submit_solution`	Test current implementation (50 submission limit)
`finish_challenge`	Complete challenge with current best

Time Horizon

Multi-turn. Agents analyze code, make optimizations, test with submit_solution, and iterate until satisfied or reaching the 50 submission limit.

Environment Difficulty

Benchmarks from Claude models:

Model	Cycles	Notes
Claude Opus 4 (extended)	2,164
Claude Opus 4.5 (casual)	1,790	~human level
Claude Opus 4.5 (2hr)	1,579
Claude Sonnet 4.5	1,548
Claude Opus 4.5 (11.5hr)	1,487	Hiring threshold
Claude Opus 4.5 (improved)	1,363	Current best

Other Environment Requirements

There are no further environment requirements; Performance Takehome works out of the box with the OpenReward endpoint.

Safety

Agents in Performance Takehome optimize code in an isolated sandbox environment. The environment does not present direct safety risks.

Citation

@misc{anthropic2024perftakehome,
  title={Anthropic Original Performance TakeHome},
  author={Anthropic},
  year={2024},
  url={https://github.com/anthropics/original_performance_takehome}
}

Repository

Source repository

EnvCommons/AnthropicTakehome

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	2 vCPUs / 2 GB RAM

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	$0.0000370
Total	$0.0000690

Examples

5-minute session$0.0207

1-hour session$0.2484

AnthropicPerformance

GeneralReasoning/AnthropicPerformance

AnthropicPerformance

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples