bioagentbench

Name: arXiv/bioagentbench
Author: arXiv

arXiv/bioagentbench

Description

BioAgent Bench is a benchmark dataset and evaluation suite for measuring the performance and robustness of AI agents on end-to-end bioinformatics tasks. It contains curated pipelines (e.g., RNA-seq, variant calling, metagenomics) with prompts that request concrete output artifacts for automated scoring, stress-testing perturbations (corrupted inputs, decoy files, prompt bloat), and an LLM-based grader to evaluate pipeline progress and outcome validity.

arXiv GitHub

Leaderboard

Loading leaderboard...

Implementations

No implementations linked yet. Add one to showcase related work.