PatentQATrain
PatentQATrain
Description
PatentQATrain is an ORS training environment for patent question answering, based on the PatentQA task from LAB-Bench-2. Agents are given questions about specific details from patents across diverse technology domains and must use web search to find and verify answers from Google Patents.
Capabilities
- Researching patent details using web search
- Extracting specific information from patent claims and specifications
- Navigating patent documents to find numerical thresholds, compositions, method steps, and claim elements
- Cross-domain patent understanding spanning pharmaceutical, biotech, chemistry, electronics, software, mechanical, materials, energy, medical devices, and telecom
Compute Requirements
No sandbox or special compute requirements. Uses external web search (Tavily API) for patent retrieval.
License
Tasks
There are 994 training tasks distributed across 10 patent domains:
| Domain | Count | Description |
|---|---|---|
| pharmaceutical | 100 | Drug compositions, formulations, drug delivery |
| biotech | 100 | Genetic engineering, antibodies, cell therapy |
| chemistry | 99 | Chemical compounds, synthesis, catalysis |
| electronics | 100 | Semiconductors, circuits, displays, sensors |
| software | 100 | Algorithms, data processing, networking |
| mechanical | 100 | Engines, mechanisms, manufacturing |
| materials | 100 | Polymers, composites, coatings, nanomaterials |
| energy | 99 | Solar cells, batteries, fuel cells |
| medical_devices | 99 | Surgical instruments, imaging, prosthetics |
| telecom | 97 | Wireless protocols, signal processing |
Each task presents a question about a specific patent that requires distinctive specificity — the question includes the patent number and asks about a verifiable fact uniquely traceable to that patent.
Reward Structure
Sparse, binary reward:
- 1.0 for correct answers (as judged by LLM grader)
- 0.0 for incorrect or unsure answers
Grading uses semantic equivalence checking: answers that are numerically/semantically equivalent are accepted, even if phrased differently. The grader is based on LABBench2's structured evaluation prompt.
We do not use exact string matching. The LLM grader (gpt-5-mini) evaluates whether the submitted answer captures the core factual content of the expected answer.
Data
Ground-truth data consists of QA pairs derived from Google Patents documents. Each task includes:
- A question referencing a specific patent number
- A concise expected answer (under 200 characters)
- The source patent URL
- A key passage from the patent supporting the answer
Data is stored on the OpenReward platform.
Tools
Agents have access to three tools:
| Tool | Description |
|---|---|
web_search | Search the web using Tavily. Returns titles, URLs, and snippets. |
fetch_url | Fetch full text content from a URL using Tavily extract. Supports pagination for long documents. |
submit_answer | Submit a final answer with explanation. Triggers LLM grading and ends the episode. |
Time Horizon
PatentQATrain is a multi-turn environment. Agents typically perform several web searches and URL fetches before submitting an answer.
Environment Difficulty
[Statistics on environment difficulty here]
Other Environment Requirements
This environment requires the following API keys passed via the secrets parameter:
openai_api_key: For LLM-based answer gradingtavily_api_key: For web search and URL content extraction
Safety
PatentQATrain focuses on factual information retrieval from publicly available patent records. The environment does not involve intellectual property creation, legal advice, or access to non-public information. All source patents are publicly accessible via Google Patents.
Citations
This environment is inspired by the PatentQA task from LAB-Bench-2:
@misc{labbench2,
author = {Laurent, Jon M. and Bou, Albert and Pieler, Michael and Igoe, Conor and Andonian, Alex and Narayanan, Siddharth and Braza, James and Vassopoulos, Alexandros Sanchez and Steenwyk, Jacob L. and Lash, Blake and White, Andrew D. and Rodriques, Samuel G.},
title = {LABBench2: An Improved Benchmark for AI Systems Performing Biology Research},
year = {2026},
url = {https://github.com/EdisonScientific/labbench2}
}