Bring Me A Spoon

NEW! Our Vision-and-Language Navigation test server and leaderboard is up on EvalAI.

Demo

R2R is the first benchmark dataset for visually-grounded natural language navigation in real buildings. The dataset requires autonomous agents to follow human-generated navigation instructions in previously unseen buildings, as illustrated in the demo above. For training, each instruction is associated with a Matterport3D Simulator trajectory. 22k instructions are available, with an average length of 29 words. There is a test evaluation server for this dataset available at EvalAI.

We are currently setting up a test evaluation server for this dataset.

Matterport3D Simulator

The Matterport3D Simulator enables the development of AI agents that interact with real 3D environments using visual information (RGB-D images). It is primarily intended for research in deep reinforcement learning, at the intersection of computer vision, natural language processing and robotics. Visual imagery for the simulator comes from the Matterport3D dataset, containing comprehensive panoramic imagery and other data from 90 large-scale buildings. The Matterport3D Simulator is available on GitHub.

CVPR Paper

This work has been selected for a spotlight oral presentation at CVPR 2018. Initial results were also presented at the NIPS 2017 ViGIL workshop. If you use the Matterport3D simulator or the R2R dataset, please cite our paper:

@inproceedings{mattersim,
  title={Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments},
  author={Peter Anderson and Qi Wu and Damien Teney and Jake Bruce and Mark Johnson and Niko S{\"u}nderhauf and Ian Reid and Stephen Gould and Anton van den Hengel},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2018}
}

We also ask that you acknowledge the underlying Matterport3D dataset by also citing their paper:

@article{Matterport3D,
  title={Matterport3D: Learning from {RGB-D} Data in Indoor Environments},
  author={Chang, Angel and Dai, Angela and Funkhouser, Thomas and Halber, Maciej and Niessner, Matthias and Savva, Manolis and Song, Shuran and Zeng, Andy and Zhang, Yinda},
  journal={International Conference on 3D Vision (3DV)},
  year={2017}
}

People


Peter Anderson Australian National University	Qi Wu University of Adelaide	Damien Teney University of Adelaide	Jake Bruce Queensland University of Technology	Mark Johnson Macquarie University

Niko Sünderhauf Queensland University of Technology	Ian Reid University of Adelaide	Stephen Gould Australian National University	Anton van den Hengel University of Adelaide

Future Work

To drive research in new and more challenging directions, we plan to release several related datasets at the intersection of computer vision, natural language processing and robotics.

Acknowledgements

We would like to thank Matterport for allowing the Matterport3D dataset to be used by the academic community. This project is supported by a Facebook ParlAI Research Award, and by the Australian Centre for Robotic Vision.

Natural language interaction with robots

I can ask a 5 year old to bring me a spoon, and it's likely that a spoon will appear. We want a robot to do the same. This is what we've done so far.

Room-to-Room (R2R) Navigation

Matterport3D Simulator

CVPR Paper

People

Future Work

Acknowledgements