tf.data.Dataset vs. tf.keras.Sequence #13

b-nils · 2021-12-20T10:37:04Z

Great hack to make use of the tf.data.Dataset object:

Line 46 in 48f89c2

def make_gen_callable(_gen):

I am curious whether you have noticed any performance loss or gain (in terms of training duration) in comparison to using TF1, multiprocessing, and tf.keras.Sequence?

VeeranjaneyuluToka · 2021-12-20T13:37:42Z

@b-nils I have been using this github to train coco weights (though not succeeded yet), but i found that using tf.data.Dataset api trains faster than that of tf.keras.utils.Sequence. I could notice that using tf.data.Dataset is more than 100% beneficial in training time. However i have not used the combination that you mentioned (TF1, MP, and tf.keras.Sequence) rather i have installed TF2.4.3 and measured the difference in training time between tf.data.Dataset and tf.keras.utils.Seuqnce API.

b-nils · 2021-12-21T08:44:05Z

@VeeranjaneyuluToka thanks for the insights! Do you think the performance might be further improved by applying preprocessing after tf.data.Dataset has been created and/or calling tf.data.Dataset.prefetch()?

VeeranjaneyuluToka · 2021-12-21T14:25:41Z

@b-nils Might increase, but i faced an issue when i try to implement data pipeline using tf.data.Dataset API myself prior to refer to this github and i solved that issue in the same manner. If you notice the implementation carefully, he is called repeat() method while passing to model.fit() method as parameter, i tried to do prior to calling fit() method, but it was giving error (not sure why). So i doubt if it straight away works if you call prefetch(), however i feel it is worth to experiment and check if it improves training time further. Let us know also if you get successes in that experimentation.

alexander-pv · 2021-12-29T15:51:00Z

Hi, @b-nils, thanks,

I can also agree with @VeeranjaneyuluToka that with tf.data.Dataset, data processing is faster. However, in Sequence, you can just increase the queue size for the purpose. I added the prefetch option in config.py for tests. You can simply write dataset.repeat().prefetch() to make it work.

Actually, it seems like a good option to implement data processing with pure tf.data.Dataset without any additional queues and etc. Then, probably, it is worth adding a simple generator to read images, and transfer all further processing to tf.data.Dataset. I will label the issue as a possible enhancement.

VeeranjaneyuluToka · 2021-12-30T18:01:51Z

Hi, @alexander-pv ,

Just to understand a bit more, why did you define prefetch() as an option in config? why it can not be a default behaviour?

alexander-pv · 2021-12-30T18:53:43Z

Hi, @VeeranjaneyuluToka,

For now, It seems to me that for a specific task it is worth setting your configuration of the Sequence queue size and buffer size in prefetch.

alexander-pv added the question Further information is requested label Dec 29, 2021

alexander-pv added the enhancement New feature or request label Dec 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf.data.Dataset vs. tf.keras.Sequence #13

tf.data.Dataset vs. tf.keras.Sequence #13

b-nils commented Dec 20, 2021

VeeranjaneyuluToka commented Dec 20, 2021 •

edited

Loading

b-nils commented Dec 21, 2021

VeeranjaneyuluToka commented Dec 21, 2021

alexander-pv commented Dec 29, 2021

VeeranjaneyuluToka commented Dec 30, 2021

alexander-pv commented Dec 30, 2021

tf.data.Dataset vs. tf.keras.Sequence #13

tf.data.Dataset vs. tf.keras.Sequence #13

Comments

b-nils commented Dec 20, 2021

VeeranjaneyuluToka commented Dec 20, 2021 • edited Loading

b-nils commented Dec 21, 2021

VeeranjaneyuluToka commented Dec 21, 2021

alexander-pv commented Dec 29, 2021

VeeranjaneyuluToka commented Dec 30, 2021

alexander-pv commented Dec 30, 2021

VeeranjaneyuluToka commented Dec 20, 2021 •

edited

Loading