Formula2SMILES

Description

Formula2SMILES is an environment for evaluating agents on molecular generation tasks. Given a molecular formula in Hill notation and optional functional group constraints, the agent must produce a valid SMILES string that matches the formula and satisfies all constraints. Verification uses RDKit for formula matching and exmol for functional group detection, following the ether0 approach. The dataset is derived from ZINC20 (via sagawa/ZINC-canonicalized on HuggingFace).

Capabilities

Generating valid SMILES strings from molecular formulas
Satisfying functional group constraints during molecular generation
Understanding molecular structure and Hill notation conventions
Reasoning about chemical validity (parsing, sanitization, fragment checks)

Compute Requirements

Formula2SMILES does not require a sandbox. It has minimal compute requirements.

License

Apache 2.0 (following the ZINC-canonicalized dataset license).

Tasks

There are two splits: train (1,000 tasks) and test (100 tasks), totaling 1,100 tasks. Each task provides a molecular formula in Hill notation and optionally a set of required functional groups. Approximately 60% of tasks include functional group constraints and 40% are formula-only. The dataset covers 856 unique molecular formulas.

Reward Structure

This is a sparse, verifiable reward environment with binary scoring. The agent calls submit_answer once with a SMILES string. The molecule is validated through a 5-step pipeline:

SMILES parsing with RDKit
Molecule sanitization
Reasonableness checks (single fragment, ring size <= 12)
Formula match via CalcMolFormula (Hill notation exact match)
Functional group check via exmol (if constraints specified)

Correct (all checks pass): Reward 1.0.
Incorrect (any check fails): Reward 0.0.

We do not use LLM graders for this task.

Data

Tasks are derived from ZINC20 (via sagawa/ZINC-canonicalized on HuggingFace), stored as a parquet file. Data files are stored on the OpenReward platform.

Tools

Agents are given a single tool:

submit_answer: Submit a SMILES string as the answer. The molecule is validated against the required molecular formula and any functional group constraints. Returns whether the answer is correct with a diagnostic message. This tool can only be called once per task.

Time Horizon

Formula2SMILES is a single-turn environment. The agent receives a molecular formula (with optional constraints) and submits one SMILES string. Each task requires exactly one tool call.

Environment Difficulty

[Statistics on environment difficulty here]

Other Environment Requirements

There are no further environment requirements; Formula2SMILES works out of the box with the OpenReward endpoint without any secrets.

Safety

Agents in Formula2SMILES are asked to generate molecular representations as SMILES strings. The environment does not present direct safety risks, as agents only provide text answers validated by RDKit with no access to external systems.

However, this is a dual-use domain. Models trained for molecular generation capabilities could potentially be misused for designing harmful compounds in other contexts.

Citations

@dataset{GRFormula2SMILES,
  author    = {General Reasoning Inc. Team},
  title     = {Formula2SMILES},
  year      = {2026},
  publisher = {OpenReward},
  url       = {https://openreward.ai/GeneralReasoning/Formula2SMILES}
}

@article{irwin2020zinc20,
  title={ZINC20 -- A Free Ultralarge-Scale Chemical Database for Ligand Discovery},
  author={Irwin, John J and Tang, Khanh G and Young, Jennifer and Dandarchuluun, Chinzorig and Wong, Benjamin R and Khurelbaatar, Munkhzul and Moroz, Yurii S and Mayfield, John and Sayle, Roger A},
  journal={Journal of Chemical Information and Modeling},
  volume={60},
  number={12},
  pages={6065--6073},
  year={2020},
  publisher={ACS Publications}
}

Repository

Source repository

EnvCommons/Formula2SMILES

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152