procbench
Description
proc-bench is a benchmark for directly evaluating LLMs' multi-step inference ability by providing pairs of explicit instructions and corresponding questions where the full procedures needed to solve each problem are specified, minimizing path exploration and implicit knowledge use. It consists of multiple distinct tasks with problems requiring varying numbers of steps and step-level evaluations, together with step-aware metrics and separately defined complexity measures to assess instruction-following and multi-step reasoning.
Leaderboard
Loading leaderboard...
Implementations (1)
| Environment | Stars | Last Updated | |
|---|---|---|---|
0 | 1 months ago |