RetroSynth

Description

RetroSynth is an environment for evaluating agents on single-step retrosynthesis tasks. Given a target molecule's SMILES notation, agents must propose the reactants that would produce it in a single synthetic step. The dataset is derived from the USPTO-50k reaction dataset, filtered for commercially plausible reagents.

Capabilities

Proposing reactant sets for single-step retrosynthesis
Reasoning about organic synthesis reactions and reaction mechanisms
Understanding structure-reactivity relationships in organic chemistry
Working with SMILES notation for molecular representation

Compute Requirements

RetroSynth does not require a sandbox. It has minimal compute requirements.

License

CC0.

Tasks

There are two splits: train (1,000 tasks) and test (100 tasks), totaling 1,100 tasks. Each task presents a target molecule SMILES string and asks the agent to propose the reactants as dot-separated SMILES (e.g., CCO.CC(=O)Cl).

Tasks are filtered from USPTO-50k for commercial plausibility:

Allowed elements: C, N, O, F, P, S, Cl, Br, I, B, Si, H
Reactant constraints: <= 30 heavy atoms, <= 3 rings per reactant
Product constraints: 5-100 heavy atoms
Reactant distribution: ~75% two-reactant, ~25% single-reactant reactions

Reward Structure

This is a sparse, verifiable reward environment. The agent calls submit_reactants once with proposed reactants, and the reward is computed as follows:

Exact match: Reward 1.0 if the canonical SMILES of submitted reactants exactly match the ground truth.
Partial match: Reward is the Tanimoto similarity (Morgan fingerprints, radius 2, 2048 bits) between the combined fingerprints of submitted and ground truth reactants, capped at 0.95.
Invalid SMILES: Reward 0.0.

We do not use LLM graders for this task.

Data

Tasks are derived from the USPTO-50k reaction dataset (via Therapeutics Data Commons), filtered for commercially plausible reagents. Data files are stored on the OpenReward platform.

Tools

Agents are given a single tool:

submit_reactants: Submit proposed reactants as dot-separated SMILES (e.g., CCO.CC(=O)Cl). Returns the reward based on fingerprint similarity to ground truth. This tool can only be called once per task.

Time Horizon

RetroSynth is a single-turn environment. The agent receives a target molecule and submits one set of proposed reactants. Each task requires exactly one tool call.

Environment Difficulty

Task difficulty varies based on molecular complexity. Products range from 8 to 49 heavy atoms, with larger molecules typically requiring more sophisticated retrosynthetic reasoning. Tasks include both single-reactant transformations (functional group interconversions, ~25%) and multi-reactant reactions (coupling, amide formation, etc., ~75%).

Other Environment Requirements

There are no further environment requirements; RetroSynth works out of the box with the OpenReward endpoint.

Safety

Agents in RetroSynth are asked to propose chemical reactants for synthesis tasks. The environment does not present direct safety risks, as agents only provide SMILES strings evaluated computationally, with no access to external systems or real chemical processes.

However, this is a dual-use domain as capabilities learnt in this environment could be used for malicious purposes when combined with other agentic workflows.

Citations

@dataset{GRRetroSynth,
  author    = {General Reasoning Inc. Team},
  title     = {RetroSynth},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/RetroSynth}
}

@article{lowe2012extraction,
  title={Extraction of chemical structures and reactions from the literature},
  author={Lowe, Daniel Mark},
  year={2012},
  publisher={University of Cambridge}
}

Repository

Source repository

EnvCommons/RetroSynth

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152