API Endpoint

Leaderboard

Loading leaderboard...

README

WhoDunit

Description

WhoDunit is an environment for evaluating deductive reasoning on murder mystery puzzles. It contains 100 cases where agents must gather information about suspects, weapons, locations, and clues to deduce who committed the murder, with what weapon, and where it occurred.

Capabilities

Deductive reasoning and logical inference
Multi-step information gathering
Evidence synthesis and analysis
Murder mystery puzzle solving

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

Tasks

There is one split in this environment:

train: 100 tasks (75 elementary + 25 impossible difficulty)

Each case includes suspects with physical descriptions, potential weapons, locations, clues, and optionally motives and suspect statements.

Reward Structure

This is a multi-turn environment with partial credit scoring. Agents gather information using tools, then submit their answer via submit_answer.

3-Component Tasks: WHO (33.3%) + WHAT (33.3%) + WHERE (33.3%)

4-Component Tasks: WHO (25%) + WHAT (25%) + WHERE (25%) + WHY (25%)

Validation is deterministic case-insensitive exact match. Reward ranges from 0.0 to 1.0 based on correct components.

Data

Data consists of JSON files (tasks_elementary.json, tasks_impossible.json, exhibits.json) containing murder mystery cases with suspects, weapons, locations, clues, and ground truth answers. Data is stored on the OpenReward platform.

Tools

Tool	Description
`list_suspects`	View all suspects with physical descriptions and features.
`list_weapons`	View all potential murder weapons with weight classifications.
`list_locations`	View all locations where the murder could have occurred.
`list_clues`	View all clues and evidence found at the crime scene.
`list_motives`	View potential motives (if available for the case).
`list_statements`	View suspect statements (murderer lies, others tell truth).
`view_exhibits`	View exhibits referenced in clues.
`submit_answer`	Submit who, what, where (and optionally why). Ends the episode.

Time Horizon

Multi-turn. Agents gather information using multiple tool calls before submitting their final answer.

Environment Difficulty

[Put environment difficulty here]

Other Environment Requirements

There are no further environment requirements; WhoDunit works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in WhoDunit solve fictional murder mystery puzzles in a standard environment. The environment does not present direct safety risks.

Repository

Source repository

EnvCommons/WhoDunnit

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

WhoDunit

GeneralReasoning/WhoDunit

WhoDunit

Description

Capabilities

Compute Requirements

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples