satbench

Description

SATBench is a benchmark for evaluating the logical reasoning capabilities of large language models through logical puzzles derived from Boolean satisfiability (SAT) problems. It contains 2100 puzzles automatically generated by translating SAT formulas into story contexts with adjustable difficulty via clause counts, and validated through LLM-assisted, solver-based, and partial human checks to probe search-based reasoning on SAT and UNSAT instances.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
anjianganjiang/SATBench
5
3 weeks ago
arXiv/satbench | OpenReward