VeriSciQA

API Endpoint
Leaderboard
Loading leaderboard...
Implementation of
README

VeriSciQA

OpenReward Environment Hugging Face Dataset

Description

VeriSciQA is an environment for evaluating scientific visual question answering. It contains 20,351 multiple-choice questions paired with scientific figures from research papers, spanning 20 scientific domains (Biology, Physics, Chemistry, Computer Science, Mathematics, etc.) and 12 figure types (graphs, diagrams, charts, tables, etc.).

Capabilities

  • Scientific visual question answering
  • Understanding scientific figures from research papers
  • Multiple-choice evaluation across 20 domains and 12 figure types

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

CC BY-SA 4.0.

Tasks

There is one split in this environment:

  • train: 20,351 tasks

Questions span 20 scientific domains and 12 figure types including line plots, bar charts, scatter plots, diagrams, heatmaps, and composite figures.

Reward Structure

Single-turn evaluation with deterministic grading. The agent submits a single letter answer (A, B, C, or D) via the submit_answer tool. The submitted answer is compared via exact match against the ground truth. Reward is 1.0 if correct, 0.0 if incorrect.

Data

JSONL metadata with images (~20,351 JPG files) sourced from HuggingFace datajuicer/VeriSciQA. Stored on the OpenReward platform.

Tools

ToolDescription
submit_answerSubmit a single letter answer (A, B, C, or D). Deterministic evaluation via exact match. Ends the episode.

Time Horizon

Single-turn. The agent reads the question and views the scientific figure, then submits one answer for a total of one tool call.

Environment Difficulty

VeriSciQA evaluates scientific visual reasoning:

Model TypeAccuracy
Best Proprietary Model82%
Open-Source Models~64%

The 18 percentage point gap between proprietary and open-source models demonstrates VeriSciQA's effectiveness as a challenging benchmark for scientific visual reasoning.

Other Environment Requirements

There are no further environment requirements; VeriSciQA works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in VeriSciQA answer scientific visual questions in a standard environment. The environment does not present direct safety risks.

Citation

@article{verisciqa2025,
  title={VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering},
  author={DataJuicer Team},
  journal={arXiv preprint arXiv:2511.19899},
  year={2025}
}
GeneralReasoning/VeriSciQA | OpenReward