gamebench

Description

GameBench is a cross-domain benchmark for evaluating strategic reasoning abilities of LLM agents. It comprises nine game environments, each chosen to cover at least one axis of key reasoning skill found in strategy games and selected to minimize overlap with models' pretraining corpuses.

Leaderboard
Loading leaderboard...
Implementations

No implementations linked yet. Add one to showcase related work.

arXiv/gamebench | OpenReward