FPL

API Endpoint
Leaderboard
Loading leaderboard...
README

FPL

OpenReward Environment

Description

FPL is an ORS environment that simulates the Fantasy Premier League game. Agents manage a fantasy football team across a full 38-gameweek Premier League season (2024-25), developing and executing ML-based strategies for squad selection, transfers, captaincy, and formation decisions.

Capabilities

  • Developing machine learning models for player performance prediction
  • Squad selection and management within a £100M budget
  • Transfer strategy and timing optimization
  • Captain selection and starting XI formation decisions
  • Strategic chip usage (Wildcard, Bench Boost, Triple Captain)
  • Long-horizon multi-turn execution (38 gameweeks)

Compute Requirements

Agents in FPL are given a sandbox with 2 CPUs and 2GB RAM, network access enabled, and a Python 3.12 data science image.

Tasks

There is one training task in this environment:

  • 2024-25 Season: The agent manages a fantasy team for the full 2024-25 Premier League season (38 gameweeks).

Each gameweek, the agent must:

  1. Select a 15-player squad (2 GK, 5 DEF, 5 MID, 3 FWD) within a £100M budget, with a maximum of 3 players per club.
  2. Choose a captain (2x points) and vice-captain (fallback if captain doesn't play).
  3. Pick a starting XI with a valid formation (1 GK, 3-5 DEF, 2-5 MID, 1-3 FWD) and set 4 substitutes in priority order.
  4. Optionally make transfers (1 free per gameweek, accumulating up to 5; additional transfers cost 4 points each).
  5. Optionally play a chip (one per season: Wildcard, Bench Boost, or Triple Captain).
  6. Advance to the next gameweek to receive results.

Reward Structure

This is a dense, verifiable reward environment. The primary reward is returned after each gameweek via the next_gameweek tool. Smaller rewards are also given for completing squad selection (make_initial_transfers returns reward 1.0) and transfer penalties are reflected immediately (make_transfer returns the negative penalty as reward). The gameweek reward is:

GW Points=iFinal XIscorei+Bench BoostTransfer Penalty\text{GW Points} = \sum_{i \in \text{Final XI}} \text{score}_i + \text{Bench Boost} - \text{Transfer Penalty}

Where each player's score is their base gameweek points, except the captain whose score is multiplied by 2 (or 3 with the Triple Captain chip).

  • Bench Boost: If active, bench players' scores are added to the total.
  • Transfer Penalty: 4 points deducted per transfer beyond free transfers (waived with Wildcard chip).
  • Automatic Substitutions: If a starting player doesn't play (0 minutes), the highest-priority eligible substitute replaces them.

If the captain doesn't play, the vice-captain automatically receives the captain multiplier instead.

We do not use LLM graders for this task.

Data

Agents are given access to historical player data for the 2024-25 season, including:

  • Player statistics (names, positions, clubs, values, total points)
  • Fixture schedules with kickoff times
  • Team information
  • Individual player gameweek data (points, minutes played, goals, assists, etc.)

After each gameweek, the latest data is downloaded to the sandbox for the agent to use in developing strategies. Agents also have access to pre-staged data files mounted at /tmp/gr-datasets.

Tools

Agents are given access to CLI tools for creating, viewing, and searching a filesystem. They are also given environment-specific tools:

  • list_players: Browse players by position with pagination
  • make_initial_transfers: Select the initial 15-player squad
  • make_transfer: Transfer players in/out after gameweek 1
  • set_captain: Choose captain and vice-captain
  • pick_starting_xi: Set starting XI and substitute priority order
  • view_current_squad: View current squad status, captains, and formation
  • play_chip: Activate a chip (Wildcard, Bench Boost, or Triple Captain)
  • list_fixtures: View fixtures for a given gameweek
  • next_gameweek: Advance to the next gameweek and receive results

Time Horizon

FPL is an open-ended, long-horizon environment where agents play through an entire 38-gameweek Premier League season. Each gameweek requires multiple tool calls for squad management (transfers, captaincy, formation) before advancing.

[Statistics on average tool calls here]

Environment Difficulty

[Statistics on environment difficulty here]

Safety

Agents in FPL are told to maximize their total season points. The environment does not present direct safety risks, as agents only interact with a simulated Fantasy Premier League game with no real-world financial transactions. All decisions are made against historical data within a sandboxed environment.

We think it is unlikely that optimising against the objective of maximising points in this game would promote unaligned behaviour through RL, as there is no opportunity to manipulate or hurt others in the course of maximising this objective in the game.

Citations

@dataset{GRFPL,
  author    = {General Reasoning Inc. Team},
  title     = {FPL},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/FPL}
}
GeneralReasoning/FPL | OpenReward