TrialQATrain
TrialQATrain
Description
TrialQATrain is an ORS training environment for clinical trial question answering, in a similar style to the TrialQA dataset from EdisonScientific/labbench2. Agents are given questions about specific details from clinical trials (eligibility criteria, endpoints, dosing, study design) and must use web search to find and verify answers.
Capabilities
- Researching clinical trial details using web search
- Extracting specific information from trial eligibility criteria
- Understanding trial endpoints and outcome measures
- Identifying dosing regimens and study arm structures
- Multi-hop reasoning across trial documents
Compute Requirements
No special compute requirements. The environment uses external web search (Tavily API) and does not require a sandbox.
License
Tasks
The environment contains 1,000 training tasks covering:
Therapeutic Domains:
- Oncology (180 questions)
- Cardiology (140 questions)
- Infectious Disease (120 questions)
- Neurology (110 questions)
- Metabolic/Endocrine (100 questions)
- Immunology/Autoimmune (90 questions)
- Respiratory (80 questions)
- Psychiatry (70 questions)
- Rare Diseases (60 questions)
- Other (50 questions)
Reward Structure
This is a sparse, verifiable reward environment. The reward is computed at task completion when the agent submits an answer:
- Reward 1.0: Agent's answer is semantically equivalent to the correct answer
- Reward 0.0: Agent's answer is incorrect or incomplete
Grading is performed by an LLM judge that compares the agent's answer against the reference answer and key passage from the trial.
Data
Ground truth data consists of question-answer pairs derived from ClinicalTrials.gov trial records.
Tools
Agents are given access to environment-specific tools for web search and answer submission. They can search the web for clinical trial information using web_search, fetch full content from URLs (including clinicaltrials.gov pages) using fetch_url, and submit their final answer using submit_answer.
Note that the fetch_url and web_search tools require Tavily, but are optional. If you want to use a different provider for search you can exclude these tools and use external tools instead.
Time Horizon
TrialQATrain is a multi-turn environment requiring web search and information retrieval. Agents typically need to search for clinical trials, fetch detailed trial pages from ClinicalTrials.gov, extract relevant information, and submit verified answers.
[Statistics on average tool calls here]
Environment Difficulty
[Statistics on environment difficulty here]
Other Environment Requirements
This environment requires two API keys:
- openai_api_key: For LLM-based answer grading
- tavily_api_key: For web search and URL fetching
Pass these via the secrets parameter when creating a session.
Safety
TrialQATrain focuses on factual information retrieval from public clinical trial records. The environment does not involve medical decision-making or patient data. Agents are evaluated on accuracy of information extraction, not medical advice.
Citations
@article{Laurent2024LABBench,
title={LAB-Bench: Measuring Capabilities of Language Models for Biology Research},
author={Laurent, Jon M. and Janizek, Joseph D. and Ruzo, Michael and Hinks, Michaela M. and Hammerling, Michael J. and Narayanan, Siddharth and Ponnapati, Manvitha and White, Andrew D. and Rodriques, Samuel G.},
journal={arXiv preprint arXiv:2407.10362},
year={2024},
url={https://arxiv.org/abs/2407.10362}
}