API Endpoint

Leaderboard

Loading leaderboard...

Implementation of

arXiv/verisciqa

README

VeriSciQA

Description

VeriSciQA is an environment for evaluating scientific visual question answering. It contains 20,351 multiple-choice questions paired with scientific figures from research papers, spanning 20 scientific domains (Biology, Physics, Chemistry, Computer Science, Mathematics, etc.) and 12 figure types (graphs, diagrams, charts, tables, etc.).

Capabilities

Scientific visual question answering
Understanding scientific figures from research papers
Multiple-choice evaluation across 20 domains and 12 figure types

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

CC BY-SA 4.0.

Tasks

There is one split in this environment:

train: 20,351 tasks

Questions span 20 scientific domains and 12 figure types including line plots, bar charts, scatter plots, diagrams, heatmaps, and composite figures.

Reward Structure

Single-turn evaluation with deterministic grading. The agent submits a single letter answer (A, B, C, or D) via the submit_answer tool. The submitted answer is compared via exact match against the ground truth. Reward is 1.0 if correct, 0.0 if incorrect.

Data

JSONL metadata with images (~20,351 JPG files) sourced from HuggingFace datajuicer/VeriSciQA. Stored on the OpenReward platform.

Tools

Tool	Description
`submit_answer`	Submit a single letter answer (A, B, C, or D). Deterministic evaluation via exact match. Ends the episode.

Time Horizon

Single-turn. The agent reads the question and views the scientific figure, then submits one answer for a total of one tool call.

Environment Difficulty

VeriSciQA evaluates scientific visual reasoning:

Model Type	Accuracy
Best Proprietary Model	82%
Open-Source Models	~64%

The 18 percentage point gap between proprietary and open-source models demonstrates VeriSciQA's effectiveness as a challenging benchmark for scientific visual reasoning.

Other Environment Requirements

There are no further environment requirements; VeriSciQA works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in VeriSciQA answer scientific visual questions in a standard environment. The environment does not present direct safety risks.

Citation

@article{verisciqa2025,
  title={VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering},
  author={DataJuicer Team},
  journal={arXiv preprint arXiv:2511.19899},
  year={2025}
}

Repository

Source repository

EnvCommons/VeriSciQA

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

VeriSciQA

GeneralReasoning/VeriSciQA

VeriSciQA

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples