Skip to content

Commit d3df00f

Browse files
committed
ADD README
1 parent acee425 commit d3df00f

File tree

3 files changed

+31
-4
lines changed

3 files changed

+31
-4
lines changed

README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Distillation Improves Visual Place Recognition for Low-Quality Queries
2+
3+
[Anbang Yang](https://www.linkedin.com/in/anbang-yang/), [Yao Wang](https://engineering.nyu.edu/faculty/yao-wang),[John-Ross Rizzo](https://med.nyu.edu/faculty/johnross-rizzo), [Chen Feng](https://scholar.google.com/citations?user=YeG8ZM0AAAAJ)
4+
5+
<!-- **We significantly improve SPARE3D baselines using self-supervised learning approaches.** -->
6+
7+
<p align="center"><img src='static/images/Figure1.png' align="center" height="500px"> </p>
8+
9+
<!-- Poster Page: https://ai4ce.github.io/Self-Supervised-SPARE3D/ -->
10+
11+
<!-- [**ArXiv: Self-supervised Spatial Reasoning on Multi-View Line Drawings**](https://arxiv.org/abs/2104.13433) -->
12+
13+
## Abstract
14+
The shift to online computing for real-time visual localization often requires streaming query images/videos to a server for visual place recognition (VPR), where fast video transmission may result in reduced resolution or increased quantization. This compromises the quality of global image descriptors, leading to decreased VPR performance. To improve the low recall rate for low-quality query images, we present a simple yet effective method that uses high-quality queries only during training to distill better feature representations for deep-learning-based VPR, such as NetVLAD. Specifically, we use mean squared error (MSE) loss between the global descriptors of queries with different qualities, and inter-channel correlation knowledge distillation (ICKD) loss over their corresponding intermediate features. We validate our approach using the both Pittsburgh 250k dataset and our own indoor dataset with varying quantization levels. By fine-tuning NetVLAD parameters with our distillation-augmented losses, we achieve notable VPR recall-rate improvements over low-quality queries, as demonstrated in our extensive experimental results. We believe this work not only pushes forward the VPR research but also provides valuable insights for applications needing dependable place recognition under resource-limited conditions.
15+
16+
## Method
17+
Our approach to Visual Place Recognition (VPR) leverages knowledge distillation to make a student network mimic a sophisticated teacher network. Rooted in the NetVLAD algorithm, we employ a dual-branch distillation model comprising a student and teacher branch, both utilizing the VGG-16 architecture. The student branch processes low-quality images, while the teacher works with high-quality images. To assess the student's performance, we introduced three loss functions: Inter-channel Correlation Knowledge Distillation (ICKD) Loss, Mean Squared Error (MSE) Loss, and a Weakly Supervised Triplet Ranking Loss. However, our experiments indicated that a composite loss, excluding the triplet loss, yielded the best results. The final loss function is a weighted sum of ICKD and MSE losses.
18+
<p align="center"><img src='static/images/Method.png' align="center" height="500px"> </p>
19+
20+
## Quantity Results
21+
We validated our distillation model using the Pitts250k dataset, a cornerstone in VPR research, and our custom dataset curated from the 6th floor of the Lighthouse Guild. The Pitts250k dataset was segmented for training, database, and validation purposes and underwent downsampling to distinct resolutions. Our dataset emphasizes the effects of video bitrate and resolution, involving extensive video processing, including downsampling and quantization. Post-processing, frames were extracted, and ground truth locations were ascertained using a combination of OpenVSLAM and the Aligner GUI. The following figure shows that our method outperforms the NetVLAD baseline, as well as the other loss configurations on low-quality images' retrieval.
22+
<p align="center"><img src='static/images/Figure4.JPEG' align="center" height="500px"> </p>
23+
24+
25+
## Qualitative Results
26+
To discern why the distillation model trained on Pitts250k with loss setting $4$ (MSE+ICKD) surpasses the performance of other loss configurations, we visualized the attention heatmaps of the feature encoder, as depicted in Figure \ref{fig:heatmap}. We juxtaposed the attention heatmap from loss setting $4$ against those derived from models trained with two other loss settings: $1$ (MSE) and $3$ (Triplet). Observably, under loss setting $4$, the model predominantly concentrates on critical regions while overlooking areas prone to visual aliasing. This focused attention might underpin the superior performance exhibited by this particular loss setting.
27+
<p align="center"><img src='static/images/Figure6.png' align="center" height="500px"> </p>

index.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,14 +56,14 @@ <h1 class="title is-1 publication-title">Distillation Improves Visual Place Reco
5656
<div class="is-size-5 publication-authors">
5757
<!-- Paper authors -->
5858
<span class="author-block">
59-
<a href="FIRST AUTHOR PERSONAL LINK" target="_blank">Anbang Yang</a>
59+
<a href="https://www.linkedin.com/in/anbang-yang/" target="_blank">Anbang Yang</a>
6060
<span class="author-block">
61-
<a href="SECOND AUTHOR PERSONAL LINK" target="_blank">Yao Wang</a>
61+
<a href="https://engineering.nyu.edu/faculty/yao-wang" target="_blank">Yao Wang</a>
6262
<span class="author-block">
63-
<a href="THIRD AUTHOR PERSONAL LINK" target="_blank">John-Ross Rizzo</a>
63+
<a href="https://med.nyu.edu/faculty/johnross-rizzo" target="_blank">John-Ross Rizzo</a>
6464
</span>
6565
<span class="author-block">
66-
<a href="THIRD AUTHOR PERSONAL LINK" target="_blank">Chen Feng</a>
66+
<a href="https://engineering.nyu.edu/faculty/chen-feng" target="_blank">Chen Feng</a>
6767
</span>
6868
</div>
6969

static/images/Figure1.png

803 KB
Loading

0 commit comments

Comments
 (0)