Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding ch. 14 Semantic SLAM #28

Open
samehmohamed88 opened this issue Feb 26, 2021 · 3 comments
Open

Question regarding ch. 14 Semantic SLAM #28

samehmohamed88 opened this issue Feb 26, 2021 · 3 comments
Labels

Comments

@samehmohamed88
Copy link

Firstly, I would like to extend my gratitude for your excellent work, your dedication and efforts. This book is truly a great resource for many people of different backgrounds.

I have just finished the first reading of the VO chapters, after a deep-dive on the Lie Algebra and Linear Optimization chapters.

My first instincts regarding ORB (Oriented FAST) corners is why not use object detection similar to YOLO v4 or v5.

I skimmed ahead to Chapter 14 and found the section on Semantic SLAM, so my instincts are not very far off. Now my questions are as follows:

  1. Since YOLO object detection is quite fast even on mobile devices now, while also being accurate, do you think that the center of a bounding box detection can serve as a replacement to an Oriented FAST feature point? Maybe the descriptor can be even smaller than 128 bit values binary since there's some semantic meaning?

  2. I was looking at ORB_SLAM2 paper and Github and noticed that the latest code commit was 4 years ago, and the more recent OpenVSLAM was terminated yesterday (Feb 25th 2021). This leads me to believe that the robotics industry (the vacuum robots, drones) are not necessarily using these techniques? Can you please offer some insight on this? Do you believe the trend in industry has shifted towards deep learning already? Or are they possibly using more in-house systems?

  3. Your book was written in 2016, and since then there have been several papers published on Unsupervised Learning of Depth and Ego-Motion, like SfMLearner and Monodepth2. Do you know if there are any SLAM systems that are based on such techniques for the VO step? In your view, is this something the industry has adopted or is it not yet mature enough?

Finally, please allow me to contribute to you work in some way, either financially or with time and effort. I don't see a link for financial contribution, and since I can't read the Chinese version I would not be able to purchase it. Can you please advise on how I can support your work by contributing either my time or some minor funding.

Thanks and keep up the good work!

@samehmohamed88
Copy link
Author

I may have answered a part of my question with a google search on "deep learning for local features". But I would still be very glad if you can provide some comments on my questions based on your experience

@dimaxano
Copy link

OFFTOP
Also would donate to the author for such a great book!

@gaoxiang12
Copy link
Owner

gaoxiang12 commented Mar 1, 2021

Hi, @aboarya here is just my opinions on your questions:

  1. The common output of the detection networks is the bounding box. Most of the bounding boxes are not accurate at the pixel level. The detection nets are trained with the MaP loss, which measures the overlap area between the detected box with the annotation. It is also difficult to annotate the box at the pixel level. So, if you are interested in using the boxes to estimate camera pose, take a look at the Quadrics SLAM and Cube SLAM where people model the inside object as an ellipsoids/cubes. On the other hand, if you are interested in using the deep-learning-based features to do SLAM, maybe SuperPoint SLAM can help you.
  2. ORB-SLAM is still developing. The most recent one is ORB_SLAM3 where IMU integration is added into VSLAM. DSO from TUM also has many versions (stereo DSO/LDSO/DSO for rolling shutter), but some versions are not open for commercial issues. I think it's the same for many other VSLAM in industries like the VSLAM used in many cellphones (ARCore, ARKit, etc).
  3. Deep learning has been used in SLAM in many ways. You can train an end-to-end VO like DeepVO, learn the depth map from monocular images, use learning-based features for VO matching, point cloud registration, and many other tasks. Some collections on this topic are also available in GitHub: deep-learning-localization-mapping. But most of this work is still experimental, not stable enough for industrial usage. The networks rely heavily on image data, and probably a VO trained in indoor environments is not suitable for self-driving cars. They are not as general as many traditional approaches like ICP or PnP solver.

About the donation, well, I think it is just fine like this. I'm glad to know you love this book. I have some economic income from the Chinese version in recent years (about 30,000 copies I think). Springer will publish the English version someday, but I'm not sure how much time it will take. Maybe you can buy it on amazon someday in the future, but for now, I only have the Chinese version in the paper form.

Also if you find some interesting books in SLAM/robotics, let me know and I'm willing to do some translation works. Thanks for your support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants