Biomni-Eval1

API Endpoint
Leaderboard
Loading leaderboard...
README

Biomni-Eval1

⭐ OpenReward Environment Hugging Face Dataset

Description

Biomni-Eval1 is an environment for evaluating AI agents on biomedical tasks spanning 10 categories, including GWAS variant prioritization, gene screening, rare disease diagnosis, and literature-based question answering. Based on the Biomni evaluation suite from Stanford, it tests general-purpose biomedical reasoning capabilities.

Capabilities

  • GWAS variant and causal gene prioritization
  • Gene screening and retrieval
  • Rare disease diagnosis
  • CRISPR delivery planning
  • Biomedical literature question answering

Compute Requirements

This is a single-turn environment with no sandbox.

License

Apache 2.0.

Tasks

There is one split in this environment:

  • Test: 433 instances across 10 biomedical task categories

Each task presents a biomedical question requiring domain-specific reasoning.

Reward Structure

This is a single-turn environment with binary reward:

  • 1.0 — Correct answer
  • 0.0 — Incorrect answer

Evaluation uses the biomni package's task-specific evaluators with format-aware matching:

Task TypeAnswer FormatEvaluation Method
GWAS variant prioritizationVariant ID (e.g., "rs4253311")Exact match
GWAS causal gene identificationGene symbol (e.g., "BRCA1")Case-insensitive match
Screen gene retrievalGene symbolCase-insensitive match
Patient gene detectionGene symbolCase-insensitive match
Rare disease diagnosisOMIM ID or disease nameNormalized match
CRISPR delivery methodMultiple choice letterSingle letter (A-E)
Lab bench Q&AMultiple choice or free textTask-specific

No LLM grading is used — all evaluation is deterministic.

Data

Data consists of a single Parquet file (biomni_eval1.parquet) containing 433 biomedical evaluation instances across 10 task categories. Each instance includes a task name, task-specific ID, prompt text, and ground truth answer.

Source: biomni/Eval1

Tools

ToolDescription
submit_answerSubmit your answer for the biomedical task. Evaluated using the biomni evaluator.

Time Horizon

Biomni-Eval1 is a single-turn environment. The agent receives a biomedical question and submits one answer.

Environment Difficulty

[Insert environment difficulty here]

Other Environment Requirements

There are no further environment requirements; Biomni-Eval1 works out of the box with the OpenReward endpoint without any secrets.

Safety

This environment evaluates biomedical reasoning on research tasks and does not involve clinical patient data or real-world medical decision-making.

Citations

@article{huang2025biomni,
  author    = {Kexin Huang and Serena Zhang and Hanchen Wang and Yuanhao Qu and Yingzhou Lu and Yusuf Roohani and others},
  title     = {Biomni: A General-Purpose Biomedical AI Agent},
  journal   = {bioRxiv preprint},
  year      = {2025},
  doi       = {10.1101/2025.05.30.656746},
  url       = {https://www.biorxiv.org/content/10.1101/2025.05.30.656746}
}
GeneralReasoning/Biomni-Eval1 | OpenReward