bixbench

Description

Bioinformatics Benchmark (BixBench) is a benchmark for evaluating LLM-based agents on practical biological data analysis, comprising over 50 real-world scenarios with nearly 300 open-answer questions designed to test dataset exploration, long multi-step analytical trajectories, and nuanced result interpretation. It aims to expose current limitations of frontier models and spur the development of agents capable of conducting rigorous bioinformatic analysis.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
EdisonScientificEdisonScientific/BixBench
1
2 months ago
arXiv/bixbench | OpenReward