API Endpoint

Leaderboard

Loading leaderboard...

README

POLARIS

Description

POLARIS is an environment for evaluating mathematical reasoning capabilities. It contains 53,291 math reasoning problems with 8 difficulty levels (0/8 to 7/8) based on pass rate estimation. The environment uses symbolic comparison for answer verification, handling equivalent expressions.

Capabilities

Mathematical reasoning across difficulty levels
Symbolic answer verification
Equivalent expression handling
Multi-format numerical answers

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

Apache 2.0.

Tasks

There is one split in this environment:

train: 53,291 tasks

Tasks span 8 difficulty levels from easy (0/8) to hard (7/8).

Reward Structure

This is a single-turn environment. The agent submits an answer via the answer tool. Verification uses symbolic comparison, handling equivalent expressions (e.g., "1/2" equals "0.5"). Reward is binary: 1.0 if correct, 0.0 if incorrect.

Data

Data consists of a Parquet file (polaris_tasks.parquet, ~11 MB) sourced from HuggingFace POLARIS-Project/Polaris-Dataset-53K. Each row contains a problem statement, expected answer, and difficulty level. Data is stored on the OpenReward platform.

Tools

Tool	Description
`answer`	Submit your final answer (number or expression). Ends the episode.

Time Horizon

Single-turn. The agent reads the math problem and submits one answer.

Environment Difficulty

[Put environment difficulty statistics here]

Other Environment Requirements

No external API keys required.

Safety

Agents in POLARIS solve mathematical problems in a standard environment. The environment does not present direct safety risks.

Citation

@misc{Polaris2025,
  title={POLARIS: A Post-Training Recipe for Scaling Reinforcement Learning on Advanced Reasoning Models},
  url={https://hkunlp.github.io/blog/2025/Polaris},
  author={An, Chenxin and Xie, Zhihui and Li, Xiaonan and Li, Lei and Zhang, Jun and Gong, Shansan and Zhong, Ming and Xu, Jingjing and Qiu, Xipeng and Wang, Mingxuan and Kong, Lingpeng},
  year={2025}
}

Repository

Source repository

EnvCommons/POLARIS

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

POLARIS-53K

GeneralReasoning/POLARIS-53K

POLARIS

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples