To achieve results more closely aligned with the paper #16

uiseoklee · 2024-04-26T11:27:52Z

Hi. A few days ago, I encountered an error while attempting to run the pretrained_ViT model. I managed to resolve it through another issue. Actually, the reason I attempted to run the pretrained_ViT model was because the results of the non-pretrained model were inconsistent with the results in the paper provided in this GitHub repository. Therefore, after resolving the pretrained issue, I trained the model with pretrained_ViT set to True, and obtained results for sequences 01, 03, 04, 05, 06, 07, and 10 as follows:

Here are the settings in train.py:

args = {
    "data_dir": "data",
    "bsize": 4,  # batch size
    "val_split": 0.1,  # percentage to use as validation data
    "window_size": 2,  # number of frames in window
    "overlap": 1,  # number of frames overlapped between windows
    "optimizer": "Adam",  # optimizer [Adam, SGD, Adagrad, RAdam]
    "lr": 1e-5,  # learning rate
    "momentum": 0.9,  # SGD momentum
    "weight_decay": 1e-4,  # SGD momentum
    "epoch": 100,  # train iters each timestep
	"weighted_loss": None,  # float to weight angles in loss function
  	"pretrained_ViT": True,  # load weights from pre-trained ViT
    "checkpoint_path": "checkpoints/Exp_vit_base_2",  # path to save checkpoint
    "checkpoint": None,  # checkpoint
}

# tiny  - patch_size=16, embed_dim=192, depth=12, num_heads=3
# small - patch_size=16, embed_dim=384, depth=12, num_heads=6
# base  - patch_size=16, embed_dim=768, depth=12, num_heads=12
model_params = {
    "dim": 768,
    "image_size": (192, 640),  #(192, 640),
    "patch_size": 16,
    "attention_type": 'divided_space_time',  # ['divided_space_time', 'space_only','joint_space_time', 'time_only']
    "num_frames": args["window_size"],
    "num_classes": 6 * (args["window_size"] - 1),  # 6 DoF for each frame
    "depth": 12,
    "heads": 12,
    "dim_head": 64,
    "attn_dropout": 0.1,
    "ff_dropout": 0.1,
    "time_only": False,
}

The results are similar to the pretrained models provided on GitHub, namely Model1, Model2, and Model3.

It seems like I might have made a mistake somewhere. Could you kindly advise on what I should correct?

The text was updated successfully, but these errors were encountered:

aofrancani · 2024-04-26T12:38:47Z

From my experiments, I got better results training from scratch than loading pretrained_ViT (curiously), so my results are without pretrained ViT. If you read the "args.pkl" files you will see the exact configuration I used to train the models (https://github.com/aofrancani/TSformer-VO?tab=readme-ov-file#2-pre-trained-models)

they are .pkl files, but they can be easily read with

import pandas as pd
args = pd.read_pickle("args.pkl")

From your images, it looks like you are not applying the 7DoF alignment like mentioned in the paper (also other good works do this, so I had to apply it to make a fair comparison with them)... The results are really poor since we deal with the monocular case and we don't have scale information. For this alignment, I used https://github.com/Huangying-Zhan/kitti-odom-eval. They already save the plots and other analyses for you... I just modified their code to save the data I needed and I made a plot on my way...

I also observed that there is a high variance in the results depending on the epoch you take to evaluate... I believe this is due to the poor results in the angle estimation (the angles have small values in the Euler representation). One thing that might help is to weight the loss for the angles, like in the DeepVO paper. I already implemented it when you set a value to "weighted_loss" (maybe 10, 20, 50, 100, idk), but in the end I had no resources to test and evaluate it :(

uiseoklee · 2024-04-28T11:45:24Z

Thank you for your kind response. After understanding that applying 7DoF alignment is crucial for achieving results consistent with the paper, I conducted evaluations using the https://github.com/Huangying-Zhan/kitti-odom-eval repository, as per your guidance. The results are as follows:

The first graph depicts the inference results from the pretrained model shared on this GitHub repository, TSformer-VO-3. The second graph represents the results after applying 7DoF alignment.

Given the dramatic change in results depending on whether 7DoF alignment is applied or not, I can't help but raise the following questions:

Q1) What is 7DoF alignment, and could you provide a recommended resource for studying it?
Q2) Can 7DoF alignment be generally applicable? In other words, if applying 7DoF alignment improves results even when the model's estimated outcomes are poor, it seems logical to include alignment code in your predict_poses.py. Is there a specific reason why you chose not to include it?

I apologize for asking many questions. I'm in a position where I need to utilize visual odometry, even though I'm not very familiar with it. I appreciate your kind and helpful responses as always.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To achieve results more closely aligned with the paper #16

To achieve results more closely aligned with the paper #16

uiseoklee commented Apr 26, 2024 •

edited

Loading

aofrancani commented Apr 26, 2024

uiseoklee commented Apr 28, 2024

To achieve results more closely aligned with the paper #16

To achieve results more closely aligned with the paper #16

Comments

uiseoklee commented Apr 26, 2024 • edited Loading

aofrancani commented Apr 26, 2024

uiseoklee commented Apr 28, 2024

uiseoklee commented Apr 26, 2024 •

edited

Loading