reasoning-core
API Endpoint
Leaderboard
Loading leaderboard...
reasoning-core OpenReward environment
This folder contains an OpenReward-compatible ORS environment for reasoning-core.
Files
server.py: OpenReward environment server implementation.requirements.txt: Python dependencies for local/dev builds.Dockerfile: Container spec used by OpenReward deployment.
Local run
cd reasoning_core/openreward/reasoning_core_env
pip install -r requirements.txt
python server.pyQuick local smoke test
from openreward import OpenReward
or_client = OpenReward()
env = or_client.environments.get(name="reasoningcore", base_url="http://localhost:8080")
print(env.list_splits())
print(env.list_tools())
print(env.list_tasks("train")[:1])Configuration
Set optional environment variables before launch:
RC_NUM_TRAIN(default500)RC_NUM_TEST(default50)RC_SEED(default0)RC_PASS_THRESHOLD(default0.9)RC_HF_DATASET(defaultreasoning-core/symbolic-reasoning-env)RC_HF_CONFIG(optional dataset config name)RC_DISABLE_HF_FALLBACK=1to disable Hugging Face loading and use procedural fallback
The task order is deterministic for fixed values of these variables.
Notes
- The environment exposes a single
answertool. - The tool returns a human-readable result with a rounded reward (
reward=0.000format). - "Accepted" is intentionally lenient and defaults to
reward >= 0.9(configurable viaRC_PASS_THRESHOLD). - The tool accepts either plain-text answers or XML-wrapped answers (
<answer>...</answer>), matching common evaluator output formats. - By default, tasks are loaded from Hugging Face dataset
reasoning-core/symbolic-reasoning-envusing its nativetrain/testsplits. - If Hugging Face loading fails, the environment falls back to deterministic procedural task generation.