Deep Learning based Visual Odometry

Visual Odometry using a Recurrent Convolutional Neural Network in PyTorch

This is a PyTorch implementation of the paper DeepVO: Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks link.

Problem Statement

Predict the current pose of the vehicle based on the previous poses, from a sequence of camera images, using an end-to-end deep learning approach

Solution

Use a recurrent convolutional neural network with the CNN part used to extract useful features from the images and the LSTM to perform sequential modeling. Thus this method

bypasses complete geometric pipeline (no camera calibration required as well)
takes as input just sequence of RGB images

Network architecture

9-layer CNN followed by a 2-layer LSTM. CNN architecture inspired from FlowNet link

CNN Details:

There are 9 Conv layers with Leaky ReLU activation function
Dropout and batch normalization employed to avoid overfitting
Size of the kernels gradually reduced from 7x7 to 3x3
Number of channels increase from 64 to 1024.
Output of last Conv layer is 2061024 dimensional vector

LSTM Details:

2 Layer LSTM each with a hidden unit size of 1000
Each layer has between 5-7 repeating units

Requirements

The codebase is implemented in Python 3.7 To install the necessary requirements, run the following commands:

If you use the python shipped virtual environments:

python3 -m venv <your_env_name>
source your_env_name/bin/activate
pip3 install -r requirements.txt

If you use conda:

conda create <your_env_name>
conda activate your_env_name 
conda install --yes --file requirements.txt
while read requirement; do conda install --yes $requirement; done < requirements.txt

Dataset

The network is trained and tested on the KITTI Vision Benchmark Suitelink, a very popular dataset used for odometry and SLAM. Sequences 00, 01, 02, 05, 09 were used for training and sequences 04, 06, 07, 10 were used for inference

Download pretrained model from : https://drive.google.com/open?id=1FfBokYsSSfMGV-FeTskNHAYaKXAZWAx_

Data Preprocessing

The following data pre-processing was performe on the data:

The input to the network is a sequence of monocular images taken from a camera on-board from KITTI Vision Benchmark Suite
The color images were downsampled to 608*184 in order to suit the computational resources.
The image is then normalized by subtracting the mean and dividing by the standard deviation.
The image sequence was sub-sequenced into smaller sequences of 5-7 images.
For each subsequence, two consecutive images were stacked to form a tensor of dimension (608,184,6)
Each such tensor is an input to Conv1 layer of the CNN.
Number of timesteps for the RNN was randomly chosen between 5 and 7 since larger timesteps exhausted the GPU resources.
The dataset contains a 12 dimensional vector as groundtruth pose for each image.
Internally, it consists of the rotation matrix (9) and the translation vector (3).
The rotation matrix is converted to Euler angles using Rodrigues' equation and the translation vector is used as is. The 6 dimensional output is saved in a numpy file.
The output of the network is a 6 dimensional vector - 3 for Euler angles indicating orientation and 3 for x,y,z indicating translation.

Parameters

Some of the important parameters used for the network are as follows

  --img_means            (0.19007764876619865, 0.15170388157131237, 0.10659445665650864)
  --img_stds             (0.2610784009469139, 0.25729316928935814, 0.25163823815039915)
  --img_size             (608,184)
  --rnn_hidden_size      1000
  --epochs               250
  --optim                Adagrad
  --learning_rate        0.0005

Examples

Training the model

python main.py

Testing using trained model

python test.py

Contact

Akshay Iyer – @akshay_iyerr – [email protected]

Portfolio

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
arch.PNG		arch.PNG
conv_lstm.PNG		conv_lstm.PNG
data_helper.py		data_helper.py
getLength.py		getLength.py
main.py		main.py
model.py		model.py
params.py		params.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
test.py		test.py
visualize.py		visualize.py
vo_demo.gif		vo_demo.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Learning based Visual Odometry

Problem Statement

Solution

Network architecture

Requirements

Dataset

Data Preprocessing

Parameters

Examples

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

akshay-iyer/DeepVO_PyTorch

Folders and files

Latest commit

History

Repository files navigation

Deep Learning based Visual Odometry

Problem Statement

Solution

Network architecture

Requirements

Dataset

Data Preprocessing

Parameters

Examples

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages