Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End of sequence #138

Open
paapu88 opened this issue Dec 29, 2019 · 7 comments
Open

End of sequence #138

paapu88 opened this issue Dec 29, 2019 · 7 comments

Comments

@paapu88
Copy link

paapu88 commented Dec 29, 2019

When following https://github.com/zzh8829/yolov3-tf2/blob/master/docs/training_voc.md

after

python3 train.py --dataset ./data/voc2012_train.tfrecord --val_dataset ./data/voc2012_val.tfrecord --classes ./data/voc2012.names --num_classes 20 --mode fit --transfer darknet --batch_size 2 --epochs 10 --weights ./checkpoints/yolov3.tf --weights_num_classes 80

I get:

yolo_output_1_loss: 12.8483 - yolo_output_2_loss: 28.81192019-12-29 18:46:01.303069: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]

This problem is discussed here:
tensorflow/tensorflow#31509

Any suggestions: I have tried tensorflow-gpu==2.0.0 and
pip3 install --user tensorflow-gpu==2.1.0rc1

@paapu88
Copy link
Author

paapu88 commented Dec 29, 2019

Also the same problem when using conda (which has tensorflow-gpu==2.1.0rc1), maybe this has something to do with not much GPU memory?

@paapu88
Copy link
Author

paapu88 commented Dec 29, 2019

ok,

Training from random weights (NOT RECOMMENDED)

Seems to work, so I'm happy with that

@paapu88 paapu88 closed this as completed Dec 29, 2019
@krxat
Copy link
Contributor

krxat commented Mar 5, 2020

@paapu88 Hi, how did you solve the problem??

@paapu88
Copy link
Author

paapu88 commented Mar 5, 2020

It crashes every now and then. I restart from lowest validation loss.

@paapu88
Copy link
Author

paapu88 commented Mar 9, 2020

and: to read old weights one must add to train.py

yolov3-tf2/train.py has been edited to:

    # Configure the model for transfer learning
    if FLAGS.transfer == 'none':
        try:
            model.load_weights(FLAGS.weights)
            print("LOADING OLD WEIGHTS FROM:", FLAGS.weights)
        except:
            print("no weights loaded, starting from scracth")

I restart with
python3 train.py --dataset ./data/hurricane_train.tfrecord --val_dataset ./data/hurricane_test.tfrecord --classes ./data/hurricane7.names --num_classes 1 --mode fit --transfer none --batch_size 1 --epochs 20 --size 416 --weights ./checkpoints/yolov3_train_1.tf

@paapu88 paapu88 reopened this Mar 9, 2020
@kindasweetgal
Copy link

My first contact with the target detection, according to your method successfully solved this problem. But when is the end of the training

@paapu88
Copy link
Author

paapu88 commented Apr 2, 2020

I just let it run and take the weights with the lowest validation error. This is not the most elegant solution, but did work for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants