OrganicChem1909

API Endpoint
Leaderboard
Loading leaderboard...
README

OrganicChem1909

OpenReward Environment

Description

OrganicChem1909 is an environment for evaluating agents on organic chemistry questions derived from "Practical Methods of Organic Chemistry" (1909). Questions test procedural understanding and conceptual comprehension of pre-modern organic chemistry practices. An LLM grader evaluates answers for conceptual correctness, accepting alternative nomenclature and awarding partial credit.

Capabilities

  • Answering organic chemistry questions requiring procedural understanding
  • Reasoning about chemical mechanisms and principles
  • Understanding historical chemistry nomenclature (1909 conventions)
  • Demonstrating conceptual comprehension rather than rote recall

Compute Requirements

OrganicChem1909 does not require a sandbox. It has minimal compute requirements.

License

MIT.

Tasks

There are three splits: train, validation, and test. Questions are loaded from a parquet file (organicchem1909_questions.parquet) and span multiple categories and difficulty levels. Each question includes metadata: chapter, page reference, category, difficulty, and a context snippet from the source textbook.

Reward Structure

This is a sparse reward environment with continuous scoring. The agent calls the answer tool once with its response, and the environment grades it using an LLM grader (gpt-5-mini). The grader assigns a score from 0.0 to 1.0 and a grade:

  • CORRECT (score 0.85+): The answer demonstrates full conceptual understanding. Reward: the grader's score (0.85-1.0).
  • PARTIALLY_CORRECT (score 0.7+): The answer shows partial but conceptually sound reasoning. Reward: the grader's score (0.7-0.85).
  • INCORRECT (score < 0.7): The answer is conceptually wrong or missing key information. Reward: the grader's score (0.0-0.7).

Grading rules:

  • Both IUPAC names, common names, and historical 1909 terminology are accepted.
  • Evaluation focuses on conceptual correctness and understanding, not exact wording.
  • Partial credit is awarded for incomplete but conceptually sound answers.
  • Safety awareness is valued even if not in the reference answer.

Data

Questions are sourced from "Practical Methods of Organic Chemistry" (1909), a public domain textbook. The dataset is stored as a parquet file on the OpenReward platform.

Tools

Agents are given a single tool:

  • answer: Submit an answer to the chemistry question. The answer is graded by the LLM grader against the reference answer. Returns the grade, score, and feedback. This tool can only be called once per task.

Time Horizon

OrganicChem1909 is a single-turn environment. The agent receives a question and submits one answer. Each task requires exactly one tool call.

Environment Difficulty

[Statistics on environment difficulty here]

Other Environment Requirements

OrganicChem1909 requires an OpenAI API key (OPENAI_API_KEY secret) for LLM-based grading of answers.

export OPENAI_API_KEY=your_api_key_here

Pass the key via the secrets parameter when creating a session:

async with environment.session(task=task, secrets={"openai_api_key": OPENAI_API_KEY}) as session:

Safety

Agents in OrganicChem1909 are asked to answer chemistry questions from a historical textbook. The environment does not present direct safety risks, as agents only provide text answers with no access to external systems, tools, or the internet.

Citations

@dataset{GROrganicChem1909,
  author    = {General Reasoning Inc. Team},
  title     = {OrganicChem1909},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/OrganicChem1909}
}
@book{gattermann1909practical,
  title={Practical Methods of Organic Chemistry},
  author={Gattermann, Ludwig},
  year={1909}
}
GeneralReasoning/OrganicChem1909 | OpenReward