Room-to-Room (R2R) Navigation


R2R is the first benchmark dataset for visually-grounded natural language navigation in real buildings. The dataset requires autonomous agents to follow human-generated navigation instructions in previously unseen buildings, as illustrated in the demo above. For training, each instruction is associated with a Matterport3D Simulator trajectory. 22k instructions are available, with an average length of 29 words. We are currently setting up a test evaluation server for this dataset.

Matterport3D Simulator

The Matterport3D Simulator enables the development of AI agents that interact with real 3D environments using visual information (RGB-D images). It is primarily intended for research in deep reinforcement learning, at the intersection of computer vision, natural language processing and robotics. Visual imagery for the simulator comes from the Matterport3D dataset, containing comprehensive panoramic imagery and other data from 90 large-scale buildings.


The Matterport3D Simulator and the Room-to-Room dataset are available on GitHub.


If you use the simulator or dataset, please cite our paper:


  title={Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments},
  author={Peter Anderson and Qi Wu and Damien Teney and Jake Bruce and Mark Johnson and Niko S{\"u}nderhauf and Ian Reid and Stephen Gould and Anton van den Hengel},
  journal={arXiv preprint arXiv:1711.07280},

This work was presented at the NIPS 2017 Visually-Grounded Interaction and Language (ViGIL) workshop.


Peter Qi Damien Jake Mark
Peter Anderson
Australian National University
Qi Wu
University of Adelaide
Damien Teney
University of Adelaide
Jake Bruce
Queensland University of Technology
Mark Johnson
Macquarie University
Niko Ian Steve Anton
Niko Sünderhauf
Queensland University of Technology
Ian Reid
University of Adelaide
Stephen Gould
Australian National University
Anton van den Hengel
University of Adelaide

Future Work

To drive research in new and more challenging directions, we plan to release several related datasets at the intersection of computer vision, natural language processing and robotics.


We would like to thank Matterport for allowing the Matterport3D dataset to be used by the academic community. This project is supported by a Facebook ParlAI Research Award, and by the Australian Centre for Robotic Vision.