portfolio
PortfolioBench
Description
Portfolio is an environment for evaluating agents on quantitative finance tasks. Agents are given a sandboxed environment with network access and must solve portfolio optimization, financial data retrieval, and volatility modeling problems. Tasks include computing Maximum Diversification Portfolio (MDP) and minimum variance portfolio weights, retrieving Federal Reserve economic data, fitting GARCH models, and optimizing EWMA parameters. An LLM grader (gpt-5-nano) evaluates answers against target values with task-specific tolerances.
Capabilities
- Portfolio optimization (Maximum Diversification, minimum variance)
- Financial data retrieval and analysis (ETF prices, Federal Reserve series)
- Volatility modeling (GARCH, EWMA covariance estimation)
- Multi-step quantitative analysis with data downloading and computation
Compute Requirements
Agents in Portfolio are given a sandbox with 0.5 CPUs and 1 GB RAM. Network access is enabled for downloading financial data.
Tasks
There is one split: train (6 tasks). Each task presents a quantitative finance problem.
Reward Structure
This is a sparse reward environment with binary scoring. The agent calls answer once with its response, and the environment grades it using an LLM grader (gpt-5-nano) against the target answer with task-specific tolerances (e.g., +/-0.1% for portfolio weights, exact match for series IDs and dates).
- Correct: Reward 1.0.
- Incorrect: Reward 0.0.
Data
Agents download financial data at runtime using Python libraries (e.g., yfinance, FRED API) within the sandbox. No pre-staged data files are provided. The sandbox image (python-ds:3.12-tools) includes standard data science libraries.
Tools
Agents are given 10 tools:
bash: Run a bash command in the sandbox.glob: Find files matching a glob pattern.grep: Search for patterns in files.ls: List files and directories.read: Read file contents.write: Write content to a file.edit: Perform string replacement in a file.multi_edit: Perform multiple edits on a single file.todo_write: Manage a todo list for task planning.answer: Submit the final answer for grading. This tool can only be called once per task.
Time Horizon
Portfolio is a multi-turn environment. The agent iterates using CLI tools (bash, read, write, etc.) to download data, write analysis scripts, and compute results before submitting a final answer.
[How many average tool calls?]
Environment Difficulty
[Statistics on environment difficulty here]
Other Environment Requirements
Portfolio requires an OpenAI API key (OPENAI_API_KEY secret) for LLM-based grading of answers.
Safety
Agents in Portfolio solve quantitative finance problems in a sandboxed environment. The environment does not present direct safety risks, as agents only interact with public financial data APIs and have no access to real trading systems or financial accounts.
Citations
@dataset{GRPortfolio,
author = {Ran Achiron and Ross Taylor},
title = {PortfolioBench},
year = {2026},
publisher = {OpenReward},
url = {https://openreward.ai/GeneralReasoning/Portfolio}
}