API Endpoint

Leaderboard

Loading leaderboard...

README

IneqMath

Description

IneqMath is an environment for evaluating an agent's ability to prove mathematical inequalities. Based on an expert-curated dataset of Olympiad-level inequality problems, it tests advanced reasoning skills including discovering tight bounds, strategic theorem application, and constructing rigorous proofs. The benchmark recasts inequality proving into verifiable subtasks: bound estimation and relation prediction.

This OpenReward implementation is ported from the Harbor Framework implementation originally made by 1171-jpg.

Capabilities

Proving mathematical inequalities at the Olympiad level
Discovering tight bounds for expressions
Applying mathematical theorems strategically (AM-GM, Cauchy-Schwarz, etc.)
Constructing step-by-step rigorous proofs

Compute Requirements

Agents are given a sandboxed environment with bash access and file editing tools. Default sandbox size is 1 CPU and 2 GB RAM.

License

CC BY-SA 4.0.

Tasks

There is one split in this environment:

Test: 100 inequality proving tasks

Problems are commissioned from IMO-level medalists to ensure novelty and minimize prior LLM training exposure.

Reward Structure

This is a multi-turn, sandbox-based environment. The agent writes its answer to /app/answer.txt and calls submit_answer. Tasks are one of two types:

Bound estimation: Find the largest/smallest constant C. An LLM judge (gpt-4o-mini) verifies mathematical equivalence with the ground truth.
Relation prediction: Identify the correct relation (≤, ≥, =, <, >) between expressions. Verified by exact match.

Reward is 1.0 for correct answers, 0.0 otherwise.

Data

Each task directory contains an instruction.md with the inequality problem and a tests/ directory with verification scripts. Task data is stored on the OpenReward platform.

Source: AI4Math/IneqMath

Tools

Tool	Description
`bash`	Execute shell commands in the sandbox.
`str_replace`	Replace a unique string in a file.
`view`	View file contents or list directory contents.
`create_file`	Create a new file with specified content.
`submit_answer`	Submit work for automated verification.

Time Horizon

IneqMath is a multi-turn environment. Agents read the problem, develop a proof strategy, implement and test their solution, and submit for verification.

Environment Difficulty

IneqMath is challenging. The original paper evaluates 29 LLMs and finds a dramatic gap between answer accuracy and reasoning soundness:

Model	Overall Accuracy	Answer-Only Accuracy
o3	21.0%	93.5%
o4-mini	15.5%	62.0%
o3-mini	9.5%	-
o1	8.0%	62.5%

The drop of up to 65.5% from answer-only to overall accuracy reveals that while LLMs often find correct answers, their reasoning chains remain fragile under step-wise scrutiny.

Other Environment Requirements

There are no further environment requirements; IneqMath works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in IneqMath solve mathematical inequality problems in a sandboxed environment. The environment does not present direct safety risks.

Citations

@inproceedings{sheng2025ineqmath,
  author    = {Jiayi Sheng and Luna Lyu and Jikai Jin and Tony Xia and Alex Gu and James Zou and Pan Lu},
  title     = {Solving Inequality Proofs with Large Language Models},
  booktitle = {NeurIPS 2025 Spotlight},
  year      = {2025},
  url       = {https://arxiv.org/abs/2506.07927}
}

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

ineqmath

GeneralReasoning/ineqmath

IneqMath

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citations

Tools

Compute Configuration

Estimated Cost

Examples