gsm1k

Description

Grade School Math 1000 (GSM1k) is a benchmark for evaluating elementary mathematical reasoning and probing dataset contamination by providing 1,000 grade-school math problems crafted to mirror the style and difficulty of GSM8k. It is designed to be comparable to GSM8k across metrics such as human solve rates, solution length, and answer magnitude while ensuring novel problems to rigorously measure generalization versus memorization.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/GSM1K
0
1 months ago
ScaleAI/gsm1k | OpenReward