TwoDollar

API Endpoint
Leaderboard
Loading leaderboard...
README

TwoDollar

OpenReward Environment

Description

TwoDollar is an environment for evaluating agents on economic negotiation and game-theoretic reasoning. This environment wraps the TwoDollar implementation from TextArena, a framework for text-based game environments.

Capabilities

  • Economic negotiation and bargaining
  • Game-theoretic strategic reasoning
  • Multi-action decision making (propose, accept, reject)
  • Competitive value maximization

Compute Requirements

TwoDollar does not require a sandbox. It has minimal compute requirements.

License

MIT.

Tasks

There are two splits: train (150 tasks) and test (150 tasks). Each split contains 50 tasks across each of 3 variants:

  • TwoDollar-v0
  • TwoDollar-v0-train
  • TwoDollar-v0-raw

Each task is seeded for reproducibility.

Reward Structure

This is a sparse reward environment. Rewards are mapped from TextArena's native range of {-1, 0, 1} to {0.0, 0.5, 1.0} via (raw + 1) / 2.

We do not use LLM graders for this environment; reward is determined programmatically.

Data

Game state is generated procedurally by the TextArena engine using seeded randomness. No external data files are required.

Tools

Agents are given three tools:

  • propose(amount, reasoning): Propose a split by specifying how much you want. The opponent gets $2.00 minus your amount.
  • accept(reasoning): Accept the opponent's proposal.
  • reject(reasoning): Reject the opponent's proposal.

Time Horizon

TwoDollar is a multi-turn environment.

Environment Difficulty

Medium - requires game-theoretic reasoning and negotiation strategy.

Other Environment Requirements

This environment requires an OpenAI API key (passed via secrets) to power the LLM opponent.

Safety

Agents in TwoDollar interact with a negotiation game and have no access to external systems, the internet, or sensitive data. However, there is a danger that models trained on this environment learn manipulative traits to achieve their goals. We recommend that models trained on this environment in a multi-environment run be complemented with constitutional rubrics and/or other environments that promote closer alignment with human values.

Citations

@software{textarena2024,
  author    = {Guertler, Leon and Banting, Wilfried and Pignatelli, Eduardo},
  title     = {TextArena},
  year      = {2024},
  publisher = {GitHub},
  url       = {https://github.com/LeonGuertler/TextArena}
}
GeneralReasoning/TwoDollar | OpenReward