API Endpoint

Leaderboard

Loading leaderboard...

README

Biomni-Eval1

Description

Biomni-Eval1 is an environment for evaluating AI agents on biomedical tasks spanning 10 categories, including GWAS variant prioritization, gene screening, rare disease diagnosis, and literature-based question answering. Based on the Biomni evaluation suite from Stanford, it tests general-purpose biomedical reasoning capabilities.

Capabilities

GWAS variant and causal gene prioritization
Gene screening and retrieval
Rare disease diagnosis
CRISPR delivery planning
Biomedical literature question answering

Compute Requirements

This is a single-turn environment with no sandbox.

License

Apache 2.0.

Tasks

There is one split in this environment:

Test: 433 instances across 10 biomedical task categories

Each task presents a biomedical question requiring domain-specific reasoning.

Reward Structure

This is a single-turn environment with binary reward:

1.0 — Correct answer
0.0 — Incorrect answer

Evaluation uses the biomni package's task-specific evaluators with format-aware matching:

Task Type	Answer Format	Evaluation Method
GWAS variant prioritization	Variant ID (e.g., "rs4253311")	Exact match
GWAS causal gene identification	Gene symbol (e.g., "BRCA1")	Case-insensitive match
Screen gene retrieval	Gene symbol	Case-insensitive match
Patient gene detection	Gene symbol	Case-insensitive match
Rare disease diagnosis	OMIM ID or disease name	Normalized match
CRISPR delivery method	Multiple choice letter	Single letter (A-E)
Lab bench Q&A	Multiple choice or free text	Task-specific

No LLM grading is used — all evaluation is deterministic.

Data

Data consists of a single Parquet file (biomni_eval1.parquet) containing 433 biomedical evaluation instances across 10 task categories. Each instance includes a task name, task-specific ID, prompt text, and ground truth answer.

Source: biomni/Eval1

Tools

Tool	Description
`submit_answer`	Submit your answer for the biomedical task. Evaluated using the biomni evaluator.

Time Horizon

Biomni-Eval1 is a single-turn environment. The agent receives a biomedical question and submits one answer.

Environment Difficulty

[Insert environment difficulty here]

Other Environment Requirements

There are no further environment requirements; Biomni-Eval1 works out of the box with the OpenReward endpoint without any secrets.

Safety

This environment evaluates biomedical reasoning on research tasks and does not involve clinical patient data or real-world medical decision-making.

Citations

@article{huang2025biomni,
  author    = {Kexin Huang and Serena Zhang and Hanchen Wang and Yuanhao Qu and Yingzhou Lu and Yusuf Roohani and others},
  title     = {Biomni: A General-Purpose Biomedical AI Agent},
  journal   = {bioRxiv preprint},
  year      = {2025},
  doi       = {10.1101/2025.05.30.656746},
  url       = {https://www.biorxiv.org/content/10.1101/2025.05.30.656746}
}

Repository

Source repository

EnvCommons/Biomni-Eval1

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

Biomni-Eval1

GeneralReasoning/Biomni-Eval1

Biomni-Eval1

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citations

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples