-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss gets low but cannot detect anything in inference, even on training set #148
Comments
I have the same problem here, was about to post this. I can confirm in my case that the problem isn't with the dataset. The loading weights problem seems interesting, I'll look into it. Please let me know if you find anything interesting. Edit: This problem could be related to #126 and #20 will try re-training in eager mode as I was using fit. |
I have tried to train with eager_fit and I get the same results, unfortunately it doesn't seem to make a difference for me. Because the loss gets so low I'm confident that the training has gone well, but I don't know how to check that any other way. Surely if the loss is so small then it would at least be able to perform on the training set once trained. I know there is no problem with the data labels from using the "visualize_dataset.py" script in the repo. Edit: Seems like the warnings on loading/saving weights are not an issue according to info from #108 . now I'm really unsure as to why I can't detect anything. |
Just to be thorough: Could you double check you are using the right Using the base model on the new data also expectedly returns nothing for me since I'm using a custom class but it behaves similarly. |
Yep, im definitely using the models that are saved during training. I've trained up to 50 epochs with quite a few difference setting combinations and I've tried using a load of different ones incase it was overfitting or something and the later epochs were bad. I can't seem to get it to detect anything with any. I can get the pretrained model to work on the example images though. |
I'm not seeing anything particularly weird when I explore the weights. The last layers change among epochs as they should. So that's probably not it. |
Is there any way to tune the confidence values? It's possible what I consider to be a low loss level is not low at all and the training never converged. When I'm testing maybe it's just not confident enough to pick anything out, even though it was trained on those images. |
The threshold may be changed by modifying this define in Model. Let me know if it helps! yolov3-tf2/yolov3_tf2/models.py Line 28 in 7c3ede5
|
@ttocs167 Scott it works! It looks like the max confidence is 0.5 for some reason? I had my classes being predicted with a confidence of 0.47. |
@TheClassyPenguin check #70, if you are training with 1 class. It explains how to change the code so the confidence scores go up with one class. |
i just saw in the documentation of loss function https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy?version=stable |
It's weird because there is no class loss when you only have one class so I think the solution provided by @nuitvolgit
seems reasonable to me, I will add this to the output function later today |
@TheClassyPenguin Thats great! I'm glad you've figured it out. I'm going to retrain my custom set again and try lowering the confidence. I'm a little worried though because my custom set has 5 classes so it shouldn't be effected by this loss issue. |
So I have discovered that reducing the confidence certainly does make detections show up, however they are basically nonesense and all very low confidence despite the fact that the primary loss metric drops down to ~5 during the end of training... I'm going to train again without some image augmentation I added to the dataset using tf.image.random functions just in case they are messing up my training. But it makes no sense to me how the loss can get so low and yet the predictions on those same images are so poor. |
@zzh8829 I trained with 1 class sparse_categorical_crossentropy. Would this make a huge difference? I thought binary is just a special case of sparse so for 1 class it is mathematically the same. |
So it turns out my training works reasonably well if I take off the image augmentation I had. I'm getting confidence values around 0.5 to 0.7 on the training set, which is no great, but it at least means it's working. I'm starting to think that the image augmentation does not adjust the labels in the same ways and thus ruins the training. No idea why the loss values still drop really low if its giving completely garbage results though. I was using tf.image functions for augmentation, is there a way to augment the data procedurally in the input pipeline? I had my functions in like so: I'm not really sure how these functions interact with the bounding box data, the contrast changes should obviously not have any effect but training was not successful with that line included. I'm assuming I implemented them wrong and they ruined the data in a way where the loss was still good but the outputs are bad. Edit: With contrast adjustments only the accuracy shoots up to ~0.8. It seems to be finally working as intended. It seems my data augmentation was killing the data/label pairs somehow so that they were mismatched and the network was unable to learn anything. I'm going to keep trying to get my other augmentation steps working so that I can make it even more robust. |
@ttocs167 In your data augmentation, the transformations are only applied to image. Any transformation that changes the image geometry would need to be applied to labels as well. Constrast change works because it doesn't shift the bounding boxes, but others like flipping would. I don't think tensorflow natively provide augmentation for both images and bounding box. The build-in image augmenters were designed with classification task and generative model in mind. Defining custom augmenter is not trivial you can check out this documentation from |
@zzh8829 I suspected as much, but when I read up on the tf.image functions I found this stack overflow thread stating that the bounding boxes should be augmented in the same way. Looking at my code now there is no way the labels are even passed into the function so it obviously isnt working. I plan on looking into this and still using tf.image if it's as easy as passing the labels into the function as well. |
So how should this problem be solved? Is it enough to adjust the threshold? Sorry, this is very important to me. |
I am having the same issue. Loss gets low, but no detections are happening. Can anyone here help? |
I am training on a custom dataset using the darknet transfer option and the loss values drop very low and stop after a few epochs:
loss: 22.6205 - yolo_output_0_loss: 1.1167 - yolo_output_1_loss: 7.4397 - yolo_output_2_loss: 0.0091- val_loss: 16.5518 - val_yolo_output_0_loss: 0.146 - val_yolo_output_1_loss: 2.7821 - val_yolo_output_2_loss: 0.0090
but I cannot detect anything even when using images from the training set...
I have tried all of the different transfer modes, but using "fine_tune" and "no_ouput" both give errors on startup, so "none" and "darknet" are all I can use.
I also have the
Unresolved object in checkpoint:
issue even though I am using the --weights_num_classes option correctly. I see that you force supress that message on detect.py using .expect_partial() when loading weights, but I am suspicious this is still causing issues by failing to load the model correctly.The text was updated successfully, but these errors were encountered: