Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To achieve results more closely aligned with the paper #16

Open
uiseoklee opened this issue Apr 26, 2024 · 2 comments
Open

To achieve results more closely aligned with the paper #16

uiseoklee opened this issue Apr 26, 2024 · 2 comments

Comments

@uiseoklee
Copy link

uiseoklee commented Apr 26, 2024

Hi. A few days ago, I encountered an error while attempting to run the pretrained_ViT model. I managed to resolve it through another issue. Actually, the reason I attempted to run the pretrained_ViT model was because the results of the non-pretrained model were inconsistent with the results in the paper provided in this GitHub repository. Therefore, after resolving the pretrained issue, I trained the model with pretrained_ViT set to True, and obtained results for sequences 01, 03, 04, 05, 06, 07, and 10 as follows:

pred_traj_01

pred_traj_03

pred_traj_04

pred_traj_05

pred_traj_06

pred_traj_07

pred_traj_10

Here are the settings in train.py:

args = {
    "data_dir": "data",
    "bsize": 4,  # batch size
    "val_split": 0.1,  # percentage to use as validation data
    "window_size": 2,  # number of frames in window
    "overlap": 1,  # number of frames overlapped between windows
    "optimizer": "Adam",  # optimizer [Adam, SGD, Adagrad, RAdam]
    "lr": 1e-5,  # learning rate
    "momentum": 0.9,  # SGD momentum
    "weight_decay": 1e-4,  # SGD momentum
    "epoch": 100,  # train iters each timestep
	"weighted_loss": None,  # float to weight angles in loss function
  	"pretrained_ViT": True,  # load weights from pre-trained ViT
    "checkpoint_path": "checkpoints/Exp_vit_base_2",  # path to save checkpoint
    "checkpoint": None,  # checkpoint
}

# tiny  - patch_size=16, embed_dim=192, depth=12, num_heads=3
# small - patch_size=16, embed_dim=384, depth=12, num_heads=6
# base  - patch_size=16, embed_dim=768, depth=12, num_heads=12
model_params = {
    "dim": 768,
    "image_size": (192, 640),  #(192, 640),
    "patch_size": 16,
    "attention_type": 'divided_space_time',  # ['divided_space_time', 'space_only','joint_space_time', 'time_only']
    "num_frames": args["window_size"],
    "num_classes": 6 * (args["window_size"] - 1),  # 6 DoF for each frame
    "depth": 12,
    "heads": 12,
    "dim_head": 64,
    "attn_dropout": 0.1,
    "ff_dropout": 0.1,
    "time_only": False,
}

The results are similar to the pretrained models provided on GitHub, namely Model1, Model2, and Model3.

It seems like I might have made a mistake somewhere. Could you kindly advise on what I should correct?

@aofrancani
Copy link
Owner

From my experiments, I got better results training from scratch than loading pretrained_ViT (curiously), so my results are without pretrained ViT. If you read the "args.pkl" files you will see the exact configuration I used to train the models (https://github.com/aofrancani/TSformer-VO?tab=readme-ov-file#2-pre-trained-models)

they are .pkl files, but they can be easily read with

import pandas as pd
args = pd.read_pickle("args.pkl")

From your images, it looks like you are not applying the 7DoF alignment like mentioned in the paper (also other good works do this, so I had to apply it to make a fair comparison with them)... The results are really poor since we deal with the monocular case and we don't have scale information. For this alignment, I used https://github.com/Huangying-Zhan/kitti-odom-eval. They already save the plots and other analyses for you... I just modified their code to save the data I needed and I made a plot on my way...

I also observed that there is a high variance in the results depending on the epoch you take to evaluate... I believe this is due to the poor results in the angle estimation (the angles have small values in the Euler representation). One thing that might help is to weight the loss for the angles, like in the DeepVO paper. I already implemented it when you set a value to "weighted_loss" (maybe 10, 20, 50, 100, idk), but in the end I had no resources to test and evaluate it :(

@uiseoklee
Copy link
Author

Thank you for your kind response. After understanding that applying 7DoF alignment is crucial for achieving results consistent with the paper, I conducted evaluations using the https://github.com/Huangying-Zhan/kitti-odom-eval repository, as per your guidance. The results are as follows:

The first graph depicts the inference results from the pretrained model shared on this GitHub repository, TSformer-VO-3. The second graph represents the results after applying 7DoF alignment.

pred_traj_01

sequence_01_1

pred_traj_04

sequence_04_1

Given the dramatic change in results depending on whether 7DoF alignment is applied or not, I can't help but raise the following questions:

Q1) What is 7DoF alignment, and could you provide a recommended resource for studying it?
Q2) Can 7DoF alignment be generally applicable? In other words, if applying 7DoF alignment improves results even when the model's estimated outcomes are poor, it seems logical to include alignment code in your predict_poses.py. Is there a specific reason why you chose not to include it?

I apologize for asking many questions. I'm in a position where I need to utilize visual odometry, even though I'm not very familiar with it. I appreciate your kind and helpful responses as always.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants