Abstract: Goal-Oriented Vision-and-Language Navigation (VLN) aims to enable agents to navigate to specified locations and identify designated target objects following natural language instruction.