GDPval
Description
GDPval is a benchmark for evaluating AI model capabilities on real-world economically valuable tasks. It covers the majority of U.S. Bureau of Labor Statistics Work Activities for 44 occupations across the top nine sectors contributing to U.S. GDP, uses tasks constructed from the representative work of seasoned industry professionals, includes an open-source gold subset of 220 tasks, and provides a public automated grading service.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |