VeriSciQA
VeriSciQA
Description
VeriSciQA is an environment for evaluating scientific visual question answering. It contains 20,351 multiple-choice questions paired with scientific figures from research papers, spanning 20 scientific domains (Biology, Physics, Chemistry, Computer Science, Mathematics, etc.) and 12 figure types (graphs, diagrams, charts, tables, etc.).
Capabilities
- Scientific visual question answering
- Understanding scientific figures from research papers
- Multiple-choice evaluation across 20 domains and 12 figure types
Compute Requirements
Agents are given a standard environment with no sandbox or file system access.
License
Tasks
There is one split in this environment:
- train: 20,351 tasks
Questions span 20 scientific domains and 12 figure types including line plots, bar charts, scatter plots, diagrams, heatmaps, and composite figures.
Reward Structure
Single-turn evaluation with deterministic grading. The agent submits a single letter answer (A, B, C, or D) via the submit_answer tool. The submitted answer is compared via exact match against the ground truth. Reward is 1.0 if correct, 0.0 if incorrect.
Data
JSONL metadata with images (~20,351 JPG files) sourced from HuggingFace datajuicer/VeriSciQA. Stored on the OpenReward platform.
Tools
| Tool | Description |
|---|---|
submit_answer | Submit a single letter answer (A, B, C, or D). Deterministic evaluation via exact match. Ends the episode. |
Time Horizon
Single-turn. The agent reads the question and views the scientific figure, then submits one answer for a total of one tool call.
Environment Difficulty
VeriSciQA evaluates scientific visual reasoning:
| Model Type | Accuracy |
|---|---|
| Best Proprietary Model | 82% |
| Open-Source Models | ~64% |
The 18 percentage point gap between proprietary and open-source models demonstrates VeriSciQA's effectiveness as a challenging benchmark for scientific visual reasoning.
Other Environment Requirements
There are no further environment requirements; VeriSciQA works out of the box with the OpenReward endpoint without any external API keys.
Safety
Agents in VeriSciQA answer scientific visual questions in a standard environment. The environment does not present direct safety risks.
Citation
@article{verisciqa2025,
title={VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering},
author={DataJuicer Team},
journal={arXiv preprint arXiv:2511.19899},
year={2025}
}