inverseifeval

Description

Inverse IFEval is a benchmark that measures models' counter-intuitive ability to override training-induced biases and comply with adversarial or unconventional instructions. It contains 1,012 high-quality Chinese and English questions across 23 domains spanning eight challenge types (e.g., Question Correction, Intentional Textual Flaws, Code without Comments, Counterfactual Answering) built with a human-in-the-loop pipeline and evaluated using an LLM-as-a-Judge framework.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/InverseIFEval
0
1 months ago
arXiv/inverseifeval | OpenReward