gso
Description
GSO is a benchmark for evaluating language models' capabilities in developing high-performance software. It uses an automated pipeline to generate and execute performance tests by mining repository commit histories to produce 102 challenging optimization tasks across 10 diverse codebases and measures agents' runtime improvements against expert developer optimizations.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |