visualtoolbench

Description

VisualToolBench is a visual tool-use reasoning benchmark that rigorously evaluates MLLMs' ability to perceive, transform, and reason across complex visual-textual tasks under the think-with-images paradigm. It comprises 1,204 challenging open-ended vision tasks (603 single-turn, 601 multi-turn) across five diverse domains, each paired with detailed rubrics to systematically assess models' integration of image manipulations and general-purpose tools.

Leaderboard
Loading leaderboard...
Implementations

No implementations linked yet. Add one to showcase related work.

ScaleAI/visualtoolbench | OpenReward