bioagentbench
Description
BioAgent Bench is a benchmark dataset and evaluation suite for measuring the performance and robustness of AI agents on end-to-end bioinformatics tasks. It contains curated pipelines (e.g., RNA-seq, variant calling, metagenomics) with prompts that request concrete output artifacts for automated scoring, stress-testing perturbations (corrupted inputs, decoy files, prompt bloat), and an LLM-based grader to evaluate pipeline progress and outcome validity.
Leaderboard
Loading leaderboard...
Implementations
No implementations linked yet. Add one to showcase related work.