gamebench

Name: arXiv/gamebench
Author: arXiv

arXiv/gamebench

Description

GameBench is a cross-domain benchmark for evaluating strategic reasoning abilities of LLM agents. It comprises nine game environments, each chosen to cover at least one axis of key reasoning skill found in strategy games and selected to minimize overlap with models' pretraining corpuses.

arXiv

Leaderboard

Loading leaderboard...

Implementations

No implementations linked yet. Add one to showcase related work.