API Endpoint

Leaderboard

Loading leaderboard...

Implementation of

arXiv/inverseifeval

README

InverseIFEval

Description

InverseIFEval is an environment for evaluating an agent's ability to follow counterintuitive instructions. Based on the Inverse IFEval benchmark, agents must override conventional training behaviors and follow unconventional or counterintuitive instructions across 8 instruction categories in both Chinese and English.

Capabilities

Following counterintuitive instructions
Overriding trained conventions
Bilingual instruction following (Chinese and English)
Handling 8 types of unconventional instruction patterns

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

Tasks

There is one split in this environment:

test: 1,012 tasks (506 Chinese + 506 English)

Tasks span 8 instruction types:

Instruction Type	Count
Instructional Induction	154
Mid-turn Instruction Modification	108
Counterfactual Answering	108
Code without Comments	198
Deliberately Incorrect Answers	186
Counter-Conventional Formatting	82
Question Correction	90
Intentional Textual Flaws	86

Each task presents the agent with a counterintuitive instruction that deliberately deviates from standard conventions. The agent must read the instruction and submit a response that correctly follows the unconventional directive.

Reward Structure

Single-turn with LLM-graded rewards. The agent submits a response via the submit_response tool. Each task in the dataset includes a judge_prompt_template and judge_system_prompt that define grading criteria specific to that instruction type. The grader (gpt-5-mini) substitutes the agent's response and a reference answer into the template, then outputs a JSON verdict with answer_score of 0 or 1.

The reward is binary:

1.0 if the response correctly follows the counterintuitive instruction
0.0 if it does not

Data

The dataset consists of a single file:

inverse_ifeval.parquet (6.27 MB, 1,012 samples)

Sourced from the HuggingFace dataset m-a-p/Inverse_IFEval. Each sample contains a prompt, response reference, and judge template for LLM-based grading. Data is stored on the OpenReward platform.

Tools

Agents have access to a single tool:

submit_response: Submit a text response for LLM-based grading against the counterintuitive instruction. Accepts a response string parameter. The episode ends after calling this tool.

Time Horizon

Single-turn. The agent reads the counterintuitive instruction and submits one response.

Environment Difficulty

The original paper evaluates frontier models on Inverse IFEval (Overall Score, English):

Model	Score
o3-high	75.7
o3-mini	74.7
GPT-5-high	73.7
Claude-4-Opus-Thinking	67.2
Claude-4-Sonnet-Thinking	64.0
DeepSeek-R1	50.0
DeepSeek-V3	39.6

Models show ~30% performance drop vs conventional IFEval. Thinking mechanisms improve scores by ~15% on average.

Other Environment Requirements

OpenAI API key required for LLM-based grading. Pass via secrets={"openai_api_key": "..."} when creating a session.

Safety

Agents in InverseIFEval follow counterintuitive instructions in a standard environment. While some tasks ask agents to produce deliberately incorrect or unconventional outputs, this is done in a controlled evaluation context and does not present direct safety risks.

Citation

@article{inverse_ifeval_2024,
  title={Inverse IFEval: Evaluating LLMs' Ability to Follow Counterintuitive Instructions},
  author={MAP Team},
  year={2024},
  url={https://huggingface.co/datasets/m-a-p/Inverse_IFEval}
}

Repository

Source repository

EnvCommons/InverseIFEval

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

InverseIFEval

GeneralReasoning/InverseIFEval

InverseIFEval

Description

Capabilities

Compute Requirements

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples