You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are looking to test our submission on a 100M scale filtered dataset, and would be happy to integrate it into datasets.py if the embeddings and metadata were added to the domain where the dataset currently downloads from. We would prepare them ourselves, but the corresponding file for dataset preparation refers to an external script for generating the metadata, and we do not have the full set of CLIP descriptors.
I could probably do it but it's still smaller than 100M (like 90M or the like) because of missing images, videos, etc.
LMK if this is of interest to you.
If we're adding other sub-100M filter datasets, do you think we should try to standardize on a round number like 50M or just subset as needed for comparison?
We are looking to test our submission on a 100M scale filtered dataset, and would be happy to integrate it into
datasets.py
if the embeddings and metadata were added to the domain where the dataset currently downloads from. We would prepare them ourselves, but the corresponding file for dataset preparation refers to an external script for generating the metadata, and we do not have the full set of CLIP descriptors.@mdouze could you make the full 100M vector dataset available where the 10M subset is hosted at https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/yfcc100M/?
The text was updated successfully, but these errors were encountered: