satbench

Name: arXiv/satbench
Author: arXiv

arXiv/satbench

Logical Reasoning via SAT-based Puzzle Solving

Description

SATBench is a benchmark for evaluating the logical reasoning capabilities of large language models through logical puzzles derived from Boolean satisfiability (SAT) problems. It contains 2100 puzzles automatically generated by translating SAT formulas into story contexts with adjustable difficulty via clause counts, and validated through LLM-assisted, solver-based, and partial human checks to probe search-based reasoning on SAT and UNSAT instances.

arXiv

Leaderboard

Loading leaderboard...

Implementations (1)

Environment	Stars	Last Updated
anjiang/SATBench	5	2 months ago