API Endpoint

Leaderboard

Loading leaderboard...

Implementation of

arXiv/yearguessr

README

YearGuessr

Description

YearGuessr is an environment for evaluating architectural age estimation from building facade images. Agents predict the construction year of buildings (1001-2024 CE) based on visual reasoning about architectural styles, construction materials, and historical design patterns. The dataset covers 157 countries with 113,480 buildings.

Capabilities

Architectural age estimation from facade images
Visual reasoning about construction materials and design patterns
Regression task with year prediction (not classification)

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

CC-BY-SA 4.0.

Tasks

Three splits in this environment:

train: 33,337 tasks
test: 11,100 tasks
valid: 11,100 tasks

Total: 113,480 buildings across 157 countries, spanning 1001-2024 CE.

Reward Structure

Single-turn evaluation with deterministic grading. The agent submits a year prediction via the guess tool. Reward is based on interval accuracy at ±20 years (architectural period): 1.0 if within 20 years, 0.0 otherwise. Additional metrics (MAE, IA@5, IA@50, IA@100) are included in metadata.

Data

Parquet files (~8 GB total) sourced from HuggingFace Morris0401/Year-Guessr-Dataset. Stored on the OpenReward platform.

Tools

Tool	Description
`guess`	Submit a year prediction (integer 1001-2024). Deterministic evaluation. Ends the episode.

Time Horizon

Single-turn. The agent views the building facade image and submits one year prediction.

Environment Difficulty

YearGuessr evaluates visual reasoning about architectural history across 1000+ years and 157 countries. The task requires understanding architectural styles, materials, and regional design conventions without explicit labels.

Other Environment Requirements

There are no further environment requirements; YearGuessr works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in YearGuessr predict building construction years in a standard environment. The environment does not present direct safety risks.

Citation

@article{szutu2025beyondmemorization,
  title={Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models},
  author={Szu-Tu, Li-Zhong and Wu, Ting-Lin and Chang, Chia-Jui and Syu, He and Liu, Yu-Lun},
  journal={arXiv preprint arXiv:2512.21337},
  year={2025}
}

Repository

Source repository

EnvCommons/YearGuessr

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	Not configured

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	Not configured
Total	$0.0000320

Examples

5-minute session$0.0096

1-hour session$0.1152

YearGuessr

GeneralReasoning/YearGuessr

YearGuessr

Description

Capabilities

Compute Requirements

License

Tasks

Reward Structure

Data

Tools

Time Horizon

Environment Difficulty

Other Environment Requirements

Safety

Citation

Repository

Clone Repository

Tools

Compute Configuration

Estimated Cost

Examples