WhoDunit

API Endpoint
Leaderboard
Loading leaderboard...
README

WhoDunit

OpenReward Environment

Description

WhoDunit is an environment for evaluating deductive reasoning on murder mystery puzzles. It contains 100 cases where agents must gather information about suspects, weapons, locations, and clues to deduce who committed the murder, with what weapon, and where it occurred.

Capabilities

  • Deductive reasoning and logical inference
  • Multi-step information gathering
  • Evidence synthesis and analysis
  • Murder mystery puzzle solving

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

Tasks

There is one split in this environment:

  • train: 100 tasks (75 elementary + 25 impossible difficulty)

Each case includes suspects with physical descriptions, potential weapons, locations, clues, and optionally motives and suspect statements.

Reward Structure

This is a multi-turn environment with partial credit scoring. Agents gather information using tools, then submit their answer via submit_answer.

3-Component Tasks: WHO (33.3%) + WHAT (33.3%) + WHERE (33.3%)

4-Component Tasks: WHO (25%) + WHAT (25%) + WHERE (25%) + WHY (25%)

Validation is deterministic case-insensitive exact match. Reward ranges from 0.0 to 1.0 based on correct components.

Data

Data consists of JSON files (tasks_elementary.json, tasks_impossible.json, exhibits.json) containing murder mystery cases with suspects, weapons, locations, clues, and ground truth answers. Data is stored on the OpenReward platform.

Tools

ToolDescription
list_suspectsView all suspects with physical descriptions and features.
list_weaponsView all potential murder weapons with weight classifications.
list_locationsView all locations where the murder could have occurred.
list_cluesView all clues and evidence found at the crime scene.
list_motivesView potential motives (if available for the case).
list_statementsView suspect statements (murderer lies, others tell truth).
view_exhibitsView exhibits referenced in clues.
submit_answerSubmit who, what, where (and optionally why). Ends the episode.

Time Horizon

Multi-turn. Agents gather information using multiple tool calls before submitting their final answer.

Environment Difficulty

[Put environment difficulty here]

Other Environment Requirements

There are no further environment requirements; WhoDunit works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in WhoDunit solve fictional murder mystery puzzles in a standard environment. The environment does not present direct safety risks.

GeneralReasoning/WhoDunit | OpenReward