vln-ego

Description

VLN-Ego is a benchmark for vision-language navigation that provides egocentric video streams and expert action demonstrations in the Habitat 3D simulator to enable training and evaluation of end-to-end LVLM-based continuous navigation agents. It consists of simulated egocentric trajectories paired with natural language instructions, action-level expert demonstrations, and Long-Short Memory Sampling to balance historical and current observations for supervised and reinforcement fine-tuning.

Leaderboard
Loading leaderboard...
Implementations

No implementations linked yet. Add one to showcase related work.

arXiv/vln-ego | OpenReward