beyondaime
Description
BeyondAIME is a benchmark for evaluating generalized STEM reasoning that extends AIME-style problems to probe deep, stepwise mathematical problem-solving. Codeforces is a coding benchmark built from competitive programming problems to assess algorithmic and programmatic problem-solving abilities; both were developed internally and will be publicly released.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |