FPL

Description

FPL is an ORS environment that simulates the Fantasy Premier League game. Agents manage a fantasy football team across a full 38-gameweek Premier League season (2024-25), developing and executing ML-based strategies for squad selection, transfers, captaincy, and formation decisions.

Capabilities

Developing machine learning models for player performance prediction
Squad selection and management within a £100M budget
Transfer strategy and timing optimization
Captain selection and starting XI formation decisions
Strategic chip usage (Wildcard, Bench Boost, Triple Captain)
Long-horizon multi-turn execution (38 gameweeks)

Compute Requirements

Agents in FPL are given a sandbox with 2 CPUs and 2GB RAM, network access enabled, and a Python 3.12 data science image.

Tasks

There is one training task in this environment:

2024-25 Season: The agent manages a fantasy team for the full 2024-25 Premier League season (38 gameweeks).

Each gameweek, the agent must:

Select a 15-player squad (2 GK, 5 DEF, 5 MID, 3 FWD) within a £100M budget, with a maximum of 3 players per club.
Choose a captain (2x points) and vice-captain (fallback if captain doesn't play).
Pick a starting XI with a valid formation (1 GK, 3-5 DEF, 2-5 MID, 1-3 FWD) and set 4 substitutes in priority order.
Optionally make transfers (1 free per gameweek, accumulating up to 5; additional transfers cost 4 points each).
Optionally play a chip (one per season: Wildcard, Bench Boost, or Triple Captain).
Advance to the next gameweek to receive results.

Reward Structure

This is a dense, verifiable reward environment. The primary reward is returned after each gameweek via the next_gameweek tool. Smaller rewards are also given for completing squad selection (make_initial_transfers returns reward 1.0) and transfer penalties are reflected immediately (make_transfer returns the negative penalty as reward). The gameweek reward is:

$\text{GW Points} = \sum_{i \in \text{Final XI}} \text{score}_i + \text{Bench Boost} - \text{Transfer Penalty}$

Where each player's score is their base gameweek points, except the captain whose score is multiplied by 2 (or 3 with the Triple Captain chip).

Bench Boost: If active, bench players' scores are added to the total.
Transfer Penalty: 4 points deducted per transfer beyond free transfers (waived with Wildcard chip).
Automatic Substitutions: If a starting player doesn't play (0 minutes), the highest-priority eligible substitute replaces them.

If the captain doesn't play, the vice-captain automatically receives the captain multiplier instead.

We do not use LLM graders for this task.

Data

Agents are given access to historical player data for the 2024-25 season, including:

Player statistics (names, positions, clubs, values, total points)
Fixture schedules with kickoff times
Team information
Individual player gameweek data (points, minutes played, goals, assists, etc.)

After each gameweek, the latest data is downloaded to the sandbox for the agent to use in developing strategies. Agents also have access to pre-staged data files mounted at /tmp/gr-datasets.

Tools

Agents are given access to CLI tools for creating, viewing, and searching a filesystem. They are also given environment-specific tools:

list_players: Browse players by position with pagination
make_initial_transfers: Select the initial 15-player squad
make_transfer: Transfer players in/out after gameweek 1
set_captain: Choose captain and vice-captain
pick_starting_xi: Set starting XI and substitute priority order
view_current_squad: View current squad status, captains, and formation
play_chip: Activate a chip (Wildcard, Bench Boost, or Triple Captain)
list_fixtures: View fixtures for a given gameweek
next_gameweek: Advance to the next gameweek and receive results

Time Horizon

FPL is an open-ended, long-horizon environment where agents play through an entire 38-gameweek Premier League season. Each gameweek requires multiple tool calls for squad management (transfers, captaincy, formation) before advancing.

[Statistics on average tool calls here]

Environment Difficulty

[Statistics on environment difficulty here]

Safety

Agents in FPL are told to maximize their total season points. The environment does not present direct safety risks, as agents only interact with a simulated Fantasy Premier League game with no real-world financial transactions. All decisions are made against historical data within a sandboxed environment.

We think it is unlikely that optimising against the objective of maximising points in this game would promote unaligned behaviour through RL, as there is no opportunity to manipulate or hurt others in the course of maximising this objective in the game.

Citations

@dataset{GRFPL,
  author    = {General Reasoning Inc. Team},
  title     = {FPL},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/FPL}
}

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	2 vCPUs / 8 GB RAM
Sandbox Machine	2 vCPUs / 2 GB RAM

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000640
Sandbox	$0.0000370
Total	$0.0001010

Examples

5-minute session$0.0303

1-hour session$0.3636