vln-mme

Name: arXiv/vln-mme
Author: arXiv

arXiv/vln-mme

Description

VLN-MME is a standardized benchmark for probing multimodal large language models as zero-shot embodied agents in Vision-and-Language Navigation by bridging traditional navigation datasets into a unified, extensible, and modular evaluation framework. It enables structured comparisons and component-level ablations across MLLM architectures and agent designs, and reveals that augmentations like Chain-of-Thought and self-reflection can degrade performance, exposing limited 3D spatial reasoning and sequential decision-making in current MLLMs.

arXiv GitHub

Leaderboard

Loading leaderboard...

Implementations

No implementations linked yet. Add one to showcase related work.