Loss gets low but cannot detect anything in inference, even on training set #148

ttocs167 · 2020-01-08T08:53:22Z

I am training on a custom dataset using the darknet transfer option and the loss values drop very low and stop after a few epochs:

loss: 22.6205 - yolo_output_0_loss: 1.1167 - yolo_output_1_loss: 7.4397 - yolo_output_2_loss: 0.0091- val_loss: 16.5518 - val_yolo_output_0_loss: 0.146 - val_yolo_output_1_loss: 2.7821 - val_yolo_output_2_loss: 0.0090

but I cannot detect anything even when using images from the training set...

I have tried all of the different transfer modes, but using "fine_tune" and "no_ouput" both give errors on startup, so "none" and "darknet" are all I can use.

I also have the Unresolved object in checkpoint: issue even though I am using the --weights_num_classes option correctly. I see that you force supress that message on detect.py using .expect_partial() when loading weights, but I am suspicious this is still causing issues by failing to load the model correctly.

The text was updated successfully, but these errors were encountered:

TheClassyPenguin · 2020-01-08T10:58:14Z

I have the same problem here, was about to post this.

I can confirm in my case that the problem isn't with the dataset. The loading weights problem seems interesting, I'll look into it. Please let me know if you find anything interesting.

Edit: This problem could be related to #126 and #20 will try re-training in eager mode as I was using fit.

ttocs167 · 2020-01-08T13:21:47Z

I have tried to train with eager_fit and I get the same results, unfortunately it doesn't seem to make a difference for me.

Because the loss gets so low I'm confident that the training has gone well, but I don't know how to check that any other way. Surely if the loss is so small then it would at least be able to perform on the training set once trained.

I know there is no problem with the data labels from using the "visualize_dataset.py" script in the repo.

Edit: Seems like the warnings on loading/saving weights are not an issue according to info from #108 . now I'm really unsure as to why I can't detect anything.

TheClassyPenguin · 2020-01-08T14:08:11Z

Just to be thorough:

Could you double check you are using the right ./checkpoints/yolov3_train_X.tf weights instead of the base YOLO model ./checkpoints/yolov3.tf for inference?

Using the base model on the new data also expectedly returns nothing for me since I'm using a custom class but it behaves similarly.

ttocs167 · 2020-01-08T14:18:55Z

Yep, im definitely using the models that are saved during training. I've trained up to 50 epochs with quite a few difference setting combinations and I've tried using a load of different ones incase it was overfitting or something and the later epochs were bad. I can't seem to get it to detect anything with any.

I can get the pretrained model to work on the example images though.

TheClassyPenguin · 2020-01-08T15:50:25Z

I'm not seeing anything particularly weird when I explore the weights. The last layers change among epochs as they should. So that's probably not it.

ttocs167 · 2020-01-08T15:59:06Z

Is there any way to tune the confidence values? It's possible what I consider to be a low loss level is not low at all and the training never converged. When I'm testing maybe it's just not confident enough to pick anything out, even though it was trained on those images.

TheClassyPenguin · 2020-01-08T16:23:53Z

The threshold may be changed by modifying this define in Model. Let me know if it helps!

yolov3-tf2/yolov3_tf2/models.py

Line 28 in 7c3ede5

flags.DEFINE_float('yolo_score_threshold', 0.5, 'score threshold')

TheClassyPenguin · 2020-01-08T16:32:37Z

@ttocs167 Scott it works! It looks like the max confidence is 0.5 for some reason? I had my classes being predicted with a confidence of 0.47.

Robin2091 · 2020-01-08T16:41:56Z

@TheClassyPenguin check #70, if you are training with 1 class. It explains how to change the code so the confidence scores go up with one class.

zzh8829 · 2020-01-08T23:51:52Z

i just saw in the documentation of loss function https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy?version=stable
it says "Use this crossentropy loss function when there are two or more label classes", so i guess we need to modify the loss function for single class 😢

zzh8829 · 2020-01-08T23:55:06Z

It's weird because there is no class loss when you only have one class so I think the solution provided by @nuitvolgit

if classes > 1:
  scores = confidence * class_probs
else:
  scores = confidence

seems reasonable to me, I will add this to the output function later today

ttocs167 · 2020-01-09T08:19:10Z

@TheClassyPenguin Thats great! I'm glad you've figured it out. I'm going to retrain my custom set again and try lowering the confidence. I'm a little worried though because my custom set has 5 classes so it shouldn't be effected by this loss issue.

ttocs167 · 2020-01-09T10:34:51Z

So I have discovered that reducing the confidence certainly does make detections show up, however they are basically nonesense and all very low confidence despite the fact that the primary loss metric drops down to ~5 during the end of training...

I'm going to train again without some image augmentation I added to the dataset using tf.image.random functions just in case they are messing up my training. But it makes no sense to me how the loss can get so low and yet the predictions on those same images are so poor.

Robin2091 · 2020-01-10T02:39:40Z

@zzh8829 I trained with 1 class sparse_categorical_crossentropy. Would this make a huge difference? I thought binary is just a special case of sparse so for 1 class it is mathematically the same.

ttocs167 · 2020-01-10T09:56:27Z

So it turns out my training works reasonably well if I take off the image augmentation I had. I'm getting confidence values around 0.5 to 0.7 on the training set, which is no great, but it at least means it's working. I'm starting to think that the image augmentation does not adjust the labels in the same ways and thus ruins the training. No idea why the loss values still drop really low if its giving completely garbage results though.

I was using tf.image functions for augmentation, is there a way to augment the data procedurally in the input pipeline? I had my functions in like so:

I'm not really sure how these functions interact with the bounding box data, the contrast changes should obviously not have any effect but training was not successful with that line included. I'm assuming I implemented them wrong and they ruined the data in a way where the loss was still good but the outputs are bad.

Edit: With contrast adjustments only the accuracy shoots up to ~0.8. It seems to be finally working as intended. It seems my data augmentation was killing the data/label pairs somehow so that they were mismatched and the network was unable to learn anything. I'm going to keep trying to get my other augmentation steps working so that I can make it even more robust.

zzh8829 · 2020-01-12T11:48:59Z

@ttocs167 In your data augmentation, the transformations are only applied to image. Any transformation that changes the image geometry would need to be applied to labels as well. Constrast change works because it doesn't shift the bounding boxes, but others like flipping would.

I don't think tensorflow natively provide augmentation for both images and bounding box. The build-in image augmenters were designed with classification task and generative model in mind. Defining custom augmenter is not trivial you can check out this documentation from imgaug https://imgaug.readthedocs.io/en/latest/source/examples_bounding_boxes.html

ttocs167 · 2020-01-13T11:30:30Z

@zzh8829 I suspected as much, but when I read up on the tf.image functions I found this stack overflow thread stating that the bounding boxes should be augmented in the same way. Looking at my code now there is no way the labels are even passed into the function so it obviously isnt working. I plan on looking into this and still using tf.image if it's as easy as passing the labels into the function as well.

guangmingdexin · 2020-04-13T00:54:46Z

So how should this problem be solved? Is it enough to adjust the threshold? Sorry, this is very important to me.

aashay96 · 2020-05-06T12:27:47Z

I am having the same issue. Loss gets low, but no detections are happening. Can anyone here help?
I have also lowered the score threholds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss gets low but cannot detect anything in inference, even on training set #148

Loss gets low but cannot detect anything in inference, even on training set #148

ttocs167 commented Jan 8, 2020

TheClassyPenguin commented Jan 8, 2020 •

edited

Loading

ttocs167 commented Jan 8, 2020 •

edited

Loading

TheClassyPenguin commented Jan 8, 2020 •

edited

Loading

ttocs167 commented Jan 8, 2020

TheClassyPenguin commented Jan 8, 2020 •

edited

Loading

ttocs167 commented Jan 8, 2020

TheClassyPenguin commented Jan 8, 2020 •

edited

Loading

TheClassyPenguin commented Jan 8, 2020 •

edited

Loading

Robin2091 commented Jan 8, 2020 •

edited

Loading

zzh8829 commented Jan 8, 2020

zzh8829 commented Jan 8, 2020

ttocs167 commented Jan 9, 2020

ttocs167 commented Jan 9, 2020

Robin2091 commented Jan 10, 2020

ttocs167 commented Jan 10, 2020 •

edited

Loading

zzh8829 commented Jan 12, 2020 •

edited

Loading

ttocs167 commented Jan 13, 2020

guangmingdexin commented Apr 13, 2020

aashay96 commented May 6, 2020

Loss gets low but cannot detect anything in inference, even on training set #148

Loss gets low but cannot detect anything in inference, even on training set #148

Comments

ttocs167 commented Jan 8, 2020

TheClassyPenguin commented Jan 8, 2020 • edited Loading

ttocs167 commented Jan 8, 2020 • edited Loading

TheClassyPenguin commented Jan 8, 2020 • edited Loading

ttocs167 commented Jan 8, 2020

TheClassyPenguin commented Jan 8, 2020 • edited Loading

ttocs167 commented Jan 8, 2020

TheClassyPenguin commented Jan 8, 2020 • edited Loading

TheClassyPenguin commented Jan 8, 2020 • edited Loading

Robin2091 commented Jan 8, 2020 • edited Loading

zzh8829 commented Jan 8, 2020

zzh8829 commented Jan 8, 2020

ttocs167 commented Jan 9, 2020

ttocs167 commented Jan 9, 2020

Robin2091 commented Jan 10, 2020

ttocs167 commented Jan 10, 2020 • edited Loading

zzh8829 commented Jan 12, 2020 • edited Loading

ttocs167 commented Jan 13, 2020

guangmingdexin commented Apr 13, 2020

aashay96 commented May 6, 2020

TheClassyPenguin commented Jan 8, 2020 •

edited

Loading

ttocs167 commented Jan 8, 2020 •

edited

Loading

TheClassyPenguin commented Jan 8, 2020 •

edited

Loading

TheClassyPenguin commented Jan 8, 2020 •

edited

Loading

TheClassyPenguin commented Jan 8, 2020 •

edited

Loading

TheClassyPenguin commented Jan 8, 2020 •

edited

Loading

Robin2091 commented Jan 8, 2020 •

edited

Loading

ttocs167 commented Jan 10, 2020 •

edited

Loading

zzh8829 commented Jan 12, 2020 •

edited

Loading