Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about the Version of DataComp1B used by Recap-DataComp1B #14

Open
XavierHeart opened this issue Jul 10, 2024 · 0 comments

Comments

@XavierHeart
Copy link

Hello, this is a good job. I've noticed that the org_caption in the Recap-Datacomp1B dataset is a little different from the original Datacomp1B Text, but the meaning is similar. For example, for https://d2xkh9za9v22a1.cloudfront.net/1551975881_17272285_ADMIN_UPLOAD-list.jpg, the original text describes it as "Piggly Wiggly_CLR® Mold & Mildew Stain Remover_coupon_44993" but in Recap-Datacomp dataset org_caption is: "Shurfine_CLR\xc2\xae Mold & Mildew Stain Remover_coupon_44993" There are many other cases, so I don't have an example here. I'm wondering if Recap-Datacomp1B uses a different version of Datacomp1B or if the raw data is further processed.

Also, for Recap-Datacomp1B the url in the dataset is different from the original url, but points to the same image. This is what puzzles me, for example for this image: https://cf-img.autorevo.com/2004-porsche-911-flowery-branch-ga-6697751/476x476/2330022-6-revo.jpg?1576091095, https://cf-img.autorevo.com/2004-porsche-911-flowery-branch-ga-6697751/476x476/2330022-6-revo.jpg?1576092120, which are from recap and raw respectively datacomp1B of the same item, corresponding to the same sha256 index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant