This repository is the official implementation of MemoNav.
- Upload MemoNav training code
- Uplaod evaluation code
The source code is developed and tested in the following setting.
- Python 3.7
- pytorch 1.11.0+cu102
- habitat-sim 0.2.1
- habitat 0.2.1
Please refer to habitat-sim and habitat-lab for installation instructions.
To install requirements:
pip install -r requirements.txt
The scene datasets and task datasets used for training should be organized in the habitat-lab directory as follows:
habitat-api (or habitat-lab)
└── data
└── datasets
│ └── pointnav
│ └── gibson
│ └── v1
│ └── train
│ └── val
└── scene_datasets
└── gibson_habitat
└── *.glb, *.navmeshs
The single and multi-goal train/val/test datasets should be organized as follows:
This repo
└── image-goal-nav-dataset
|
└── gibson
| └── train
| └── multi_goal_val
| └── 1goal
| │ └── *.json.gz
| └── 2goal
| │ └── *.json.gz
| └── 3goal
| │ └── *.json.gz
| └── 4goal
| └── *.json.gz
|
└── mp3d/test
└── 1goal
└── 2goal
└── 3goal
We employ three types of navigation memory. The node features on a map are stored in the short-term memory (STM), as these features are dynamically updated. A forgetting module then retains the informative STM fraction to increase efficiency. We also introduce long-term memory (LTM) to learn global scene representations by progressively aggregating STM features. Subsequently, a graph attention module encodes the retained STM and the LTM to generate working memory (WM) which contains the scene features essential for efficient navigation.
Our model achieves the following performance on:
Following the experiemntal settings in VGM, our MemoNav model was tested on 1007 samples of this dataset. We reported the performances of our model and various baselines in the table. (NOTE: we re-evaluated the VGM pretrained model and reported new results)
Model name | SR | SPL |
---|---|---|
VGM | 70.0 | 55.4 |
MemoNav (ours) | 74.7 | 57.9 |
We collected multi-goal test datasets in the Gibson scenes by randomly sample trajectories according to the rules specified in our paper.
Model name | 2goal PR | 2goal PPL | 3goal PR | 3goal PPL | 4goal PR | 4goal PPL |
---|---|---|---|---|---|---|
VGM | 42.9 | 17.1 | 29.5 | 7.0 | 21.5 | 4.1 |
MemoNav (ours) | 50.8 | 20.1 | 38.0 | 9.0 | 28.9 | 5.1 |
We collected multi-goal test datasets in the MP3D scenes by converting the Multi-ON dataset.
Model name | 1goal SR | 1goal SPL | 2goal PR | 2goal PPL | 3goal PR | 3goal PPL |
---|---|---|---|---|---|---|
VGM | 25.1 | 16.6 | 16.7 | 5.0 | 11.8 | 2.5 |
MemoNav (ours) | 26.1 | 16.3 | 19.5 | 5.6 | 13.6 | 2.9 |
0096_Denmark_success.1.0_spl.0.2_step.218.0.mp4
waypoint_map_0096_Denmark_success.1.0_spl.0.2.mp4
0579_Scioto_success.1.0_spl.0.5_step.137.0.mp4
waypoint_map_0579_Scioto_success.1.0_spl.0.5.mp4
If you use the MemoNav agent and dataset, feel free to cite us.
@inproceedings{li2024memonav,
title={MemoNav: Working Memory Model for Visual Navigation},
author={Li, Hongxin and Wang, Zeyu and Yang, Xu and Yang, Yuran and Mei, Shuqi and Zhang, Zhaoxiang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}