TrialQATrain

API Endpoint
Leaderboard
Loading leaderboard...
README

TrialQATrain

OpenReward Environment

Description

TrialQATrain is an ORS training environment for clinical trial question answering, in a similar style to the TrialQA dataset from EdisonScientific/labbench2. Agents are given questions about specific details from clinical trials (eligibility criteria, endpoints, dosing, study design) and must use web search to find and verify answers.

Capabilities

  • Researching clinical trial details using web search
  • Extracting specific information from trial eligibility criteria
  • Understanding trial endpoints and outcome measures
  • Identifying dosing regimens and study arm structures
  • Multi-hop reasoning across trial documents

Compute Requirements

No special compute requirements. The environment uses external web search (Tavily API) and does not require a sandbox.

License

MIT

Tasks

The environment contains 1,000 training tasks covering:

Therapeutic Domains:

  • Oncology (180 questions)
  • Cardiology (140 questions)
  • Infectious Disease (120 questions)
  • Neurology (110 questions)
  • Metabolic/Endocrine (100 questions)
  • Immunology/Autoimmune (90 questions)
  • Respiratory (80 questions)
  • Psychiatry (70 questions)
  • Rare Diseases (60 questions)
  • Other (50 questions)

Reward Structure

This is a sparse, verifiable reward environment. The reward is computed at task completion when the agent submits an answer:

  • Reward 1.0: Agent's answer is semantically equivalent to the correct answer
  • Reward 0.0: Agent's answer is incorrect or incomplete

Grading is performed by an LLM judge that compares the agent's answer against the reference answer and key passage from the trial.

Data

Ground truth data consists of question-answer pairs derived from ClinicalTrials.gov trial records.

Tools

Agents are given access to environment-specific tools for web search and answer submission. They can search the web for clinical trial information using web_search, fetch full content from URLs (including clinicaltrials.gov pages) using fetch_url, and submit their final answer using submit_answer.

Note that the fetch_url and web_search tools require Tavily, but are optional. If you want to use a different provider for search you can exclude these tools and use external tools instead.

Time Horizon

TrialQATrain is a multi-turn environment requiring web search and information retrieval. Agents typically need to search for clinical trials, fetch detailed trial pages from ClinicalTrials.gov, extract relevant information, and submit verified answers.

[Statistics on average tool calls here]

Environment Difficulty

[Statistics on environment difficulty here]

Other Environment Requirements

This environment requires two API keys:

  • openai_api_key: For LLM-based answer grading
  • tavily_api_key: For web search and URL fetching

Pass these via the secrets parameter when creating a session.

Safety

TrialQATrain focuses on factual information retrieval from public clinical trial records. The environment does not involve medical decision-making or patient data. Agents are evaluated on accuracy of information extraction, not medical advice.

Citations

@article{Laurent2024LABBench,
  title={LAB-Bench: Measuring Capabilities of Language Models for Biology Research},
  author={Laurent, Jon M. and Janizek, Joseph D. and Ruzo, Michael and Hinks, Michaela M. and Hammerling, Michael J. and Narayanan, Siddharth and Ponnapati, Manvitha and White, Andrew D. and Rodriques, Samuel G.},
  journal={arXiv preprint arXiv:2407.10362},
  year={2024},
  url={https://arxiv.org/abs/2407.10362}
}
GeneralReasoning/TrialQATrain | OpenReward