YearGuessr
YearGuessr
Description
YearGuessr is an environment for evaluating architectural age estimation from building facade images. Agents predict the construction year of buildings (1001-2024 CE) based on visual reasoning about architectural styles, construction materials, and historical design patterns. The dataset covers 157 countries with 113,480 buildings.
Capabilities
- Architectural age estimation from facade images
- Visual reasoning about construction materials and design patterns
- Regression task with year prediction (not classification)
Compute Requirements
Agents are given a standard environment with no sandbox or file system access.
License
Tasks
Three splits in this environment:
- train: 33,337 tasks
- test: 11,100 tasks
- valid: 11,100 tasks
Total: 113,480 buildings across 157 countries, spanning 1001-2024 CE.
Reward Structure
Single-turn evaluation with deterministic grading. The agent submits a year prediction via the guess tool. Reward is based on interval accuracy at ±20 years (architectural period): 1.0 if within 20 years, 0.0 otherwise. Additional metrics (MAE, IA@5, IA@50, IA@100) are included in metadata.
Data
Parquet files (~8 GB total) sourced from HuggingFace Morris0401/Year-Guessr-Dataset. Stored on the OpenReward platform.
Tools
| Tool | Description |
|---|---|
guess | Submit a year prediction (integer 1001-2024). Deterministic evaluation. Ends the episode. |
Time Horizon
Single-turn. The agent views the building facade image and submits one year prediction.
Environment Difficulty
YearGuessr evaluates visual reasoning about architectural history across 1000+ years and 157 countries. The task requires understanding architectural styles, materials, and regional design conventions without explicit labels.
Other Environment Requirements
There are no further environment requirements; YearGuessr works out of the box with the OpenReward endpoint without any external API keys.
Safety
Agents in YearGuessr predict building construction years in a standard environment. The environment does not present direct safety risks.
Citation
@article{szutu2025beyondmemorization,
title={Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models},
author={Szu-Tu, Li-Zhong and Wu, Ting-Lin and Chang, Chia-Jui and Syu, He and Liu, Yu-Lun},
journal={arXiv preprint arXiv:2512.21337},
year={2025}
}