YearGuessr

API Endpoint
Leaderboard
Loading leaderboard...
Implementation of
README

YearGuessr

OpenReward Environment Hugging Face Dataset

Description

YearGuessr is an environment for evaluating architectural age estimation from building facade images. Agents predict the construction year of buildings (1001-2024 CE) based on visual reasoning about architectural styles, construction materials, and historical design patterns. The dataset covers 157 countries with 113,480 buildings.

Capabilities

  • Architectural age estimation from facade images
  • Visual reasoning about construction materials and design patterns
  • Regression task with year prediction (not classification)

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

CC-BY-SA 4.0.

Tasks

Three splits in this environment:

  • train: 33,337 tasks
  • test: 11,100 tasks
  • valid: 11,100 tasks

Total: 113,480 buildings across 157 countries, spanning 1001-2024 CE.

Reward Structure

Single-turn evaluation with deterministic grading. The agent submits a year prediction via the guess tool. Reward is based on interval accuracy at ±20 years (architectural period): 1.0 if within 20 years, 0.0 otherwise. Additional metrics (MAE, IA@5, IA@50, IA@100) are included in metadata.

Data

Parquet files (~8 GB total) sourced from HuggingFace Morris0401/Year-Guessr-Dataset. Stored on the OpenReward platform.

Tools

ToolDescription
guessSubmit a year prediction (integer 1001-2024). Deterministic evaluation. Ends the episode.

Time Horizon

Single-turn. The agent views the building facade image and submits one year prediction.

Environment Difficulty

YearGuessr evaluates visual reasoning about architectural history across 1000+ years and 157 countries. The task requires understanding architectural styles, materials, and regional design conventions without explicit labels.

Other Environment Requirements

There are no further environment requirements; YearGuessr works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in YearGuessr predict building construction years in a standard environment. The environment does not present direct safety risks.

Citation

@article{szutu2025beyondmemorization,
  title={Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models},
  author={Szu-Tu, Li-Zhong and Wu, Ting-Lin and Chang, Chia-Jui and Syu, He and Liu, Yu-Lun},
  journal={arXiv preprint arXiv:2512.21337},
  year={2025}
}
GeneralReasoning/YearGuessr | OpenReward