API Endpoint

Leaderboard

Loading leaderboard...

README

TrialQATrain

Description

TrialQATrain is an ORS training environment for clinical trial question answering, in a similar style to the TrialQA dataset from EdisonScientific/labbench2. Agents are given questions about specific details from clinical trials (eligibility criteria, endpoints, dosing, study design) and must use web search to find and verify answers.

Capabilities

Researching clinical trial details using web search
Extracting specific information from trial eligibility criteria
Understanding trial endpoints and outcome measures
Identifying dosing regimens and study arm structures
Multi-hop reasoning across trial documents

Compute Requirements

No special compute requirements. The environment uses external web search (Tavily API) and does not require a sandbox.

License

MIT

Tasks

The environment contains 1,000 training tasks covering:

Therapeutic Domains:

Oncology (180 questions)
Cardiology (140 questions)
Infectious Disease (120 questions)
Neurology (110 questions)
Metabolic/Endocrine (100 questions)
Immunology/Autoimmune (90 questions)
Respiratory (80 questions)
Psychiatry (70 questions)
Rare Diseases (60 questions)
Other (50 questions)

Reward Structure

This is a sparse, verifiable reward environment. The reward is computed at task completion when the agent submits an answer:

Reward 1.0: Agent's answer is semantically equivalent to the correct answer
Reward 0.0: Agent's answer is incorrect or incomplete

Grading is performed by an LLM judge that compares the agent's answer against the reference answer and key passage from the trial.

Data

Ground truth data consists of question-answer pairs derived from ClinicalTrials.gov trial records.

Tools

Agents are given access to environment-specific tools for web search and answer submission. They can search the web for clinical trial information using web_search, fetch full content from URLs (including clinicaltrials.gov pages) using fetch_url, and submit their final answer using submit_answer.

Note that the fetch_url and web_search tools require Tavily, but are optional. If you want to use a different provider for search you can exclude these tools and use external tools instead.

Time Horizon

TrialQATrain is a multi-turn environment requiring web search and information retrieval. Agents typically need to search for clinical trials, fetch detailed trial pages from ClinicalTrials.gov, extract relevant information, and submit verified answers.

[Statistics on average tool calls here]

Environment Difficulty

[Statistics on environment difficulty here]

Other Environment Requirements

This environment requires two API keys:

openai_api_key: For LLM-based answer grading
tavily_api_key: For web search and URL fetching

Pass these via the secrets parameter when creating a session.

Safety

TrialQATrain focuses on factual information retrieval from public clinical trial records. The environment does not involve medical decision-making or patient data. Agents are evaluated on accuracy of information extraction, not medical advice.

Citations

@article{Laurent2024LABBench,
  title={LAB-Bench: Measuring Capabilities of Language Models for Biology Research},
  author={Laurent, Jon M. and Janizek, Joseph D. and Ruzo, Michael and Hinks, Michaela M. and Hammerling, Michael J. and Narayanan, Siddharth and Ponnapati, Manvitha and White, Andrew D. and Rodriques, Samuel G.},
  journal={arXiv preprint arXiv:2407.10362},
  year={2024},
  url={https://arxiv.org/abs/2407.10362}
}

Repository

Source repository

EnvCommons/TrialQATrain

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

TrialQATrain

GeneralReasoning/TrialQATrain

TrialQATrain

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citations

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples