BrowseComp
Description
BrowseComp is a benchmark for measuring agents' web-browsing ability, comprising 1,266 questions that require persistent navigation to locate hard-to-find, entangled information. It yields short, easily verifiable answers and tests agents' persistence and creativity in finding information, serving as an analogue to programming competitions for coding agents.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |