DeceptionSearch-v0
Description
Find a hidden AWS access key in a simulated dev laptop populated with LLM-generated decoys. One env, 6 tools, 100-action budget. Searchers: GPT-5.4 + Claude Haiku 4.5. Deceiver: Gemini 3 Flash. Base world: getsentry/self-hosted.
Leaderboard
Loading leaderboard...