-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf.data.Dataset vs. tf.keras.Sequence #13
Comments
@b-nils I have been using this github to train coco weights (though not succeeded yet), but i found that using tf.data.Dataset api trains faster than that of tf.keras.utils.Sequence. I could notice that using tf.data.Dataset is more than 100% beneficial in training time. However i have not used the combination that you mentioned (TF1, MP, and tf.keras.Sequence) rather i have installed TF2.4.3 and measured the difference in training time between tf.data.Dataset and tf.keras.utils.Seuqnce API. |
@VeeranjaneyuluToka thanks for the insights! Do you think the performance might be further improved by applying preprocessing after tf.data.Dataset has been created and/or calling tf.data.Dataset.prefetch()? |
@b-nils Might increase, but i faced an issue when i try to implement data pipeline using tf.data.Dataset API myself prior to refer to this github and i solved that issue in the same manner. If you notice the implementation carefully, he is called repeat() method while passing to model.fit() method as parameter, i tried to do prior to calling fit() method, but it was giving error (not sure why). So i doubt if it straight away works if you call prefetch(), however i feel it is worth to experiment and check if it improves training time further. Let us know also if you get successes in that experimentation. |
Hi, @b-nils, thanks, I can also agree with @VeeranjaneyuluToka that with tf.data.Dataset, data processing is faster. However, in Sequence, you can just increase the queue size for the purpose. I added the prefetch option in Actually, it seems like a good option to implement data processing with pure tf.data.Dataset without any additional queues and etc. Then, probably, it is worth adding a simple generator to read images, and transfer all further processing to tf.data.Dataset. I will label the issue as a possible enhancement. |
Hi, @alexander-pv , Just to understand a bit more, why did you define prefetch() as an option in config? why it can not be a default behaviour? |
Hi, @VeeranjaneyuluToka, For now, It seems to me that for a specific task it is worth setting your configuration of the Sequence queue size and buffer size in prefetch. |
Great hack to make use of the tf.data.Dataset object:
maskrcnn_tf2/src/training.py
Line 46 in 48f89c2
I am curious whether you have noticed any performance loss or gain (in terms of training duration) in comparison to using TF1, multiprocessing, and tf.keras.Sequence?
The text was updated successfully, but these errors were encountered: