beyondaime

Description

BeyondAIME is a benchmark for evaluating generalized STEM reasoning that extends AIME-style problems to probe deep, stepwise mathematical problem-solving. Codeforces is a coding benchmark built from competitive programming problems to assess algorithmic and programmatic problem-solving abilities; both were developed internally and will be publicly released.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/BeyondAIME
0
1 months ago
arXiv/beyondaime | OpenReward