API Endpoint

Leaderboard

Loading leaderboard...

README

PortfolioBench

Description

Portfolio is an environment for evaluating agents on quantitative finance tasks. Agents are given a sandboxed environment with network access and must solve portfolio optimization, financial data retrieval, and volatility modeling problems. Tasks include computing Maximum Diversification Portfolio (MDP) and minimum variance portfolio weights, retrieving Federal Reserve economic data, fitting GARCH models, and optimizing EWMA parameters. An LLM grader (gpt-5-nano) evaluates answers against target values with task-specific tolerances.

Capabilities

Portfolio optimization (Maximum Diversification, minimum variance)
Financial data retrieval and analysis (ETF prices, Federal Reserve series)
Volatility modeling (GARCH, EWMA covariance estimation)
Multi-step quantitative analysis with data downloading and computation

Compute Requirements

Agents in Portfolio are given a sandbox with 0.5 CPUs and 1 GB RAM. Network access is enabled for downloading financial data.

Tasks

There is one split: train (6 tasks). Each task presents a quantitative finance problem.

Reward Structure

This is a sparse reward environment with binary scoring. The agent calls answer once with its response, and the environment grades it using an LLM grader (gpt-5-nano) against the target answer with task-specific tolerances (e.g., +/-0.1% for portfolio weights, exact match for series IDs and dates).

Correct: Reward 1.0.
Incorrect: Reward 0.0.

Data

Agents download financial data at runtime using Python libraries (e.g., yfinance, FRED API) within the sandbox. No pre-staged data files are provided. The sandbox image (python-ds:3.12-tools) includes standard data science libraries.

Tools

Agents are given 10 tools:

bash: Run a bash command in the sandbox.
glob: Find files matching a glob pattern.
grep: Search for patterns in files.
ls: List files and directories.
read: Read file contents.
write: Write content to a file.
edit: Perform string replacement in a file.
multi_edit: Perform multiple edits on a single file.
todo_write: Manage a todo list for task planning.
answer: Submit the final answer for grading. This tool can only be called once per task.

Time Horizon

Portfolio is a multi-turn environment. The agent iterates using CLI tools (bash, read, write, etc.) to download data, write analysis scripts, and compute results before submitting a final answer.

[How many average tool calls?]

Environment Difficulty

[Statistics on environment difficulty here]

Other Environment Requirements

Portfolio requires an OpenAI API key (OPENAI_API_KEY secret) for LLM-based grading of answers.

Safety

Agents in Portfolio solve quantitative finance problems in a sandboxed environment. The environment does not present direct safety risks, as agents only interact with public financial data APIs and have no access to real trading systems or financial accounts.

Citations

@dataset{GRPortfolio,
  author    = {Ran Achiron and Ross Taylor},
  title     = {PortfolioBench},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/Portfolio}
}

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

portfolio

GeneralReasoning/portfolio

PortfolioBench

Description

Capabilities

Compute Requirements

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citations

Tools

Compute Configuration

Estimated Cost

Examples