Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The COCO dataset #35

Open
bai-24 opened this issue Apr 5, 2023 · 10 comments
Open

The COCO dataset #35

bai-24 opened this issue Apr 5, 2023 · 10 comments

Comments

@bai-24
Copy link

bai-24 commented Apr 5, 2023

Dear Author,
The number of training sets in the COCO dataset is 82783, but the number of training sets in the code is 566435, which will greatly increase training time. Why do we do this?

@davidnvq
Copy link
Owner

davidnvq commented Apr 5, 2023

Thanks for asking. Where did you get the number of 566435 in the code? btw, every image in COCO has 5 captions. I guess 566435 / 5 = 113287 images which are (train + trainval) images.

@bai-24
Copy link
Author

bai-24 commented Apr 18, 2023

Dear Author,
Uploading image.png…
The screenshot of the problem is shown in the above figure. The training data volume is 566435, but I don't know what it represents?

@davidnvq
Copy link
Owner

Thanks for reporting. May you upload the screenshot again? I can't see your screenshot (it takes a few seconds for a picture to be uploaded on Github).
image

@bai-24
Copy link
Author

bai-24 commented Apr 18, 2023

picture

@davidnvq
Copy link
Owner

I still have no idea why it has 566435 iterations. Can you provide me more information about the config.yaml, how many GPUs you used, batch size, etc?

@davidnvq
Copy link
Owner

If you use batch_size = 1, then it may be correct as there are 566435 pairs of (image-caption).

@bai-24
Copy link
Author

bai-24 commented Apr 18, 2023

The number of GPUs I used is 1, batch_ size is 4

@davidnvq
Copy link
Owner

davidnvq commented Apr 18, 2023

Then may you check yourself the dataloader, or hard-code the batch_size = 4 in your dataloader. I believe that if batch_size = 4, you will have 566435/4 iterations. Something may be wrong here. If possible, may you send me your fork/code? I will check it tomorrow after I finish my work.

@bai-24
Copy link
Author

bai-24 commented Apr 18, 2023

I seem to know the reason, the batch_size I set in the code is 1.Thank you very much for your help.

@Wangdanchunbufuz
Copy link

The number of GPUs I used is 1, batch_ size is 4

How long it takes according to your Settings?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants