OrganicChem1909

Description

OrganicChem1909 is an environment for evaluating agents on organic chemistry questions derived from "Practical Methods of Organic Chemistry" (1909). Questions test procedural understanding and conceptual comprehension of pre-modern organic chemistry practices. An LLM grader evaluates answers for conceptual correctness, accepting alternative nomenclature and awarding partial credit.

Capabilities

Answering organic chemistry questions requiring procedural understanding
Reasoning about chemical mechanisms and principles
Understanding historical chemistry nomenclature (1909 conventions)
Demonstrating conceptual comprehension rather than rote recall

Compute Requirements

OrganicChem1909 does not require a sandbox. It has minimal compute requirements.

License

MIT.

Tasks

There are three splits: train, validation, and test. Questions are loaded from a parquet file (organicchem1909_questions.parquet) and span multiple categories and difficulty levels. Each question includes metadata: chapter, page reference, category, difficulty, and a context snippet from the source textbook.

Reward Structure

This is a sparse reward environment with continuous scoring. The agent calls the answer tool once with its response, and the environment grades it using an LLM grader (gpt-5-mini). The grader assigns a score from 0.0 to 1.0 and a grade:

CORRECT (score 0.85+): The answer demonstrates full conceptual understanding. Reward: the grader's score (0.85-1.0).
PARTIALLY_CORRECT (score 0.7+): The answer shows partial but conceptually sound reasoning. Reward: the grader's score (0.7-0.85).
INCORRECT (score < 0.7): The answer is conceptually wrong or missing key information. Reward: the grader's score (0.0-0.7).

Grading rules:

Both IUPAC names, common names, and historical 1909 terminology are accepted.
Evaluation focuses on conceptual correctness and understanding, not exact wording.
Partial credit is awarded for incomplete but conceptually sound answers.
Safety awareness is valued even if not in the reference answer.

Data

Questions are sourced from "Practical Methods of Organic Chemistry" (1909), a public domain textbook. The dataset is stored as a parquet file on the OpenReward platform.

Tools

Agents are given a single tool:

answer: Submit an answer to the chemistry question. The answer is graded by the LLM grader against the reference answer. Returns the grade, score, and feedback. This tool can only be called once per task.

Time Horizon

OrganicChem1909 is a single-turn environment. The agent receives a question and submits one answer. Each task requires exactly one tool call.

Environment Difficulty

[Statistics on environment difficulty here]

Other Environment Requirements

OrganicChem1909 requires an OpenAI API key (OPENAI_API_KEY secret) for LLM-based grading of answers.

export OPENAI_API_KEY=your_api_key_here

Pass the key via the secrets parameter when creating a session:

async with environment.session(task=task, secrets={"openai_api_key": OPENAI_API_KEY}) as session:

Safety

Agents in OrganicChem1909 are asked to answer chemistry questions from a historical textbook. The environment does not present direct safety risks, as agents only provide text answers with no access to external systems, tools, or the internet.

Citations

@dataset{GROrganicChem1909,
  author    = {General Reasoning Inc. Team},
  title     = {OrganicChem1909},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/OrganicChem1909}
}

@book{gattermann1909practical,
  title={Practical Methods of Organic Chemistry},
  author={Gattermann, Ludwig},
  year={1909}
}

Repository

Source repository

EnvCommons/OrganicChem1909

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152