perfcodeedit
Description
gem5-based optimization benchmark is a benchmark for adapting LLMs to high-level program optimization that pairs a curated dataset of 77,000+ human performance-improving C++ submission edits (with unit tests) with a reproducible gem5 full-system simulation environment to reliably measure optimization impact. It evaluates LLM adaptation strategies (retrieval- and chain-of-thought prompting, performance-conditioned finetuning, and synthetic self-play) by measuring mean and best-of-N speedups against human submissions.
Leaderboard
Loading leaderboard...
Implementations
No implementations linked yet. Add one to showcase related work.