bioagentbench

Description

BioAgent Bench is a benchmark dataset and evaluation suite for measuring the performance and robustness of AI agents on end-to-end bioinformatics tasks. It contains curated pipelines (e.g., RNA-seq, variant calling, metagenomics) with prompts that request concrete output artifacts for automated scoring, stress-testing perturbations (corrupted inputs, decoy files, prompt bloat), and an LLM-based grader to evaluate pipeline progress and outcome validity.

Leaderboard
Loading leaderboard...
Implementations

No implementations linked yet. Add one to showcase related work.

arXiv/bioagentbench | OpenReward