Biomni-Eval1
Biomni-Eval1
Description
Biomni-Eval1 is an environment for evaluating AI agents on biomedical tasks spanning 10 categories, including GWAS variant prioritization, gene screening, rare disease diagnosis, and literature-based question answering. Based on the Biomni evaluation suite from Stanford, it tests general-purpose biomedical reasoning capabilities.
Capabilities
- GWAS variant and causal gene prioritization
- Gene screening and retrieval
- Rare disease diagnosis
- CRISPR delivery planning
- Biomedical literature question answering
Compute Requirements
This is a single-turn environment with no sandbox.
License
Tasks
There is one split in this environment:
- Test: 433 instances across 10 biomedical task categories
Each task presents a biomedical question requiring domain-specific reasoning.
Reward Structure
This is a single-turn environment with binary reward:
- 1.0 — Correct answer
- 0.0 — Incorrect answer
Evaluation uses the biomni package's task-specific evaluators with format-aware matching:
| Task Type | Answer Format | Evaluation Method |
|---|---|---|
| GWAS variant prioritization | Variant ID (e.g., "rs4253311") | Exact match |
| GWAS causal gene identification | Gene symbol (e.g., "BRCA1") | Case-insensitive match |
| Screen gene retrieval | Gene symbol | Case-insensitive match |
| Patient gene detection | Gene symbol | Case-insensitive match |
| Rare disease diagnosis | OMIM ID or disease name | Normalized match |
| CRISPR delivery method | Multiple choice letter | Single letter (A-E) |
| Lab bench Q&A | Multiple choice or free text | Task-specific |
No LLM grading is used — all evaluation is deterministic.
Data
Data consists of a single Parquet file (biomni_eval1.parquet) containing 433 biomedical evaluation instances across 10 task categories. Each instance includes a task name, task-specific ID, prompt text, and ground truth answer.
Source: biomni/Eval1
Tools
| Tool | Description |
|---|---|
submit_answer | Submit your answer for the biomedical task. Evaluated using the biomni evaluator. |
Time Horizon
Biomni-Eval1 is a single-turn environment. The agent receives a biomedical question and submits one answer.
Environment Difficulty
[Insert environment difficulty here]
Other Environment Requirements
There are no further environment requirements; Biomni-Eval1 works out of the box with the OpenReward endpoint without any secrets.
Safety
This environment evaluates biomedical reasoning on research tasks and does not involve clinical patient data or real-world medical decision-making.
Citations
@article{huang2025biomni,
author = {Kexin Huang and Serena Zhang and Hanchen Wang and Yuanhao Qu and Yingzhou Lu and Yusuf Roohani and others},
title = {Biomni: A General-Purpose Biomedical AI Agent},
journal = {bioRxiv preprint},
year = {2025},
doi = {10.1101/2025.05.30.656746},
url = {https://www.biorxiv.org/content/10.1101/2025.05.30.656746}
}