Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change of bin loss computation to avoid learning from empty annotations. #1011

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

johanneskbl
Copy link

TLDR: This fix leads to better performance in rotation prediction.

First of all, thank you very much for publicating your work. It was super helpful for my work as well. In my work I found a slight hickup in the loss function for the roation bin classification. It is minor in code but it has a rather big impact on the convergence and overall performance of the network for predicting rotation of objects and therefore, at inference, also in predicting velocities.

I was using the nuScenes dataset so I can't (without high effort) be sure how it looks like for the KITTI dataset. Although, in the GenericDataset class there is the parameter max_objs for both datasets. For the nuScenes dataset this parameter states that there can be at most max_objs number of annotations per image. If there are less than this number of annotations present in the image the rest is simply filled up by default labeling (basically zeros for every parameter). For simplicity, let's name this rest as "placeholder annotations". This entire concept is completely fine if those "placeholder annotations" are not used for anything. Except of course, to not (!) predict anything because it would not make sense for the network to always predict the maximum number of objects. Although, this principle of not predicting objects is already trained for in the heatmap head.

I propose removing these placeholder annotations, or to be more concise, removing the indices from the output and target tensors where the mask tensor is zero. This would have the same effect as masking the output with zeros in all other loss functions but for the entropy loss it is different. When we mask the output of the rotation bin classification we basically say we are 0% certain that the angle is either in bin 1 or not in bin 1. But since the target by default is 0 (which means not in bin 1) the backward pass optimizes the parameters for classifying not being in bin 1. Thus we do not (!) ignore the placeholder annotations when we mask the output. In most other loss functions, for example L1Loss
Loss = |pred - target| = 0 - 0 = 0,
masking the output has the effect of ignoring the placeholder annotations, but not so in the entropy loss since
e^0 != 0.

Also, in other loss functions that share this problem as for example the WeightedBCELoss you masked the unreduced loss instead of the output which has the same effect as removing the indices.

…his leads to better performance in rotation prediction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant