Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[interleave_dataset] sample batches from a single source at a time #7122

Open
memray opened this issue Aug 23, 2024 · 0 comments
Open

[interleave_dataset] sample batches from a single source at a time #7122

memray opened this issue Aug 23, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@memray
Copy link

memray commented Aug 23, 2024

Feature request

interleave_dataset and RandomlyCyclingMultiSourcesExamplesIterable enable us to sample data examples from different sources. But can we also sample batches in a similar manner (each batch only contains data from a single source)?

Motivation

Some recent research [1, 2] shows that source homogenous batching can be helpful for contrastive learning. Can we add a function called RandomlyCyclingMultiSourcesBatchesIterable to support this functionality?

Your contribution

I can contribute a PR. But I wonder what the best way is to test its correctness and robustness.

@memray memray added the enhancement New feature or request label Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant