arc-agi-1
ARC-AGI-1
Description
ARC-AGI-1 is an environment for evaluating abstract reasoning and pattern recognition capabilities. Agents are given training examples demonstrating a transformation pattern from input grids to output grids, then must apply the deduced rule to new test inputs. Each grid is a 2D array of integers (0-9) representing colors.
Capabilities
- Abstract reasoning and pattern induction
- Visual transformation rule discovery
- Grid-based spatial reasoning
Compute Requirements
Agents are given a standard environment with no sandbox or file system access.
License
Tasks
Two splits in this environment:
- training: 400 tasks
- evaluation: 400 tasks
Each task includes training examples showing input-output transformations and test inputs requiring predicted outputs.
Reward Structure
Multi-attempt evaluation with deterministic grading. The agent submits predicted output grids via the answer tool. Up to 3 attempts are allowed per task. The submitted outputs are compared via exact match against the ground truth. Reward is 1.0 if all outputs are correct, 0.0 otherwise. Episode ends on correct answer or after 3 failed attempts.
Data
Dataset loaded from HuggingFace lordspline/arc-agi. Tasks contain training examples and test inputs.
Tools
| Tool | Description |
|---|---|
answer | Submit predicted output grids as list of objects with "output" keys. Up to 3 attempts. Ends the episode on success or final attempt. |
Time Horizon
Multi-attempt. The agent analyzes training examples, deduces the transformation rule, and submits outputs with up to 3 attempts.
Environment Difficulty
ARC-AGI-1 evaluates abstract reasoning capabilities:
| Model | Accuracy |
|---|---|
| o3-preview (low) | 75.7% |
| o3 (high) | 60.8% |
| o4-mini (high) | 58.7% |
| Claude Sonnet 4 (Thinking) | 40.0% |
| Claude Opus 4 (Thinking) | 35.7% |
| Gemini 2.5 Flash | 33.3% |
| Gemini 2.5 Pro | 33.0% |
| DeepSeek R1 | 21.2% |
ARC-AGI-1 is approaching saturation, with top systems now exceeding 75% accuracy.
Other Environment Requirements
There are no further environment requirements; ARC-AGI-1 works out of the box with the OpenReward endpoint without any external API keys.
Safety
Agents in ARC-AGI-1 solve abstract reasoning puzzles in a standard environment. The environment does not present direct safety risks.
Citation
@misc{chollet2019arc,
title={On the Measure of Intelligence},
author={Fran{\c{c}}ois Chollet},
year={2019},
eprint={1911.01547},
archivePrefix={arXiv}
}