Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Sparse Models Pretraining Code & Data #423

Open
contrebande-labs opened this issue Nov 4, 2024 · 6 comments
Open

[FEATURE] Sparse Models Pretraining Code & Data #423

contrebande-labs opened this issue Nov 4, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@contrebande-labs
Copy link

Is your feature request related to a problem?
I could not find any information on the sparse neural search models hosted on HuggingFace. What's their archictecture? Are they multilingual? And most importantly how were they pre-trained? We would like to have our own models pretrained on our own data with the architectures that best suit our needs.

What solution would you like?
I would like to have access to the pretraining code and data.

@contrebande-labs contrebande-labs added enhancement New feature or request untriaged labels Nov 4, 2024
@mingshl
Copy link
Contributor

mingshl commented Nov 5, 2024

@xinyual can you please help take a look at this issue?

@mingshl mingshl removed the untriaged label Nov 5, 2024
@dhrubo-os
Copy link
Collaborator

@zhichao-aws take a look?

@contrebande-labs
Copy link
Author

@dhrubo-os if you trained the models, can you publish the training code and data? thanks!

@zhichao-aws
Copy link
Member

Hi @contrebande-labs , the paper for the sparse model is public now, https://arxiv.org/abs/2411.04403

The training code and data is still under the review process

@zhichao-aws
Copy link
Member

@dhrubo-os if you trained the models, can you publish the training code and data? thanks!

Hi @contrebande-labs , we have public the code of fine-tuning the model(repo link). It can also be used to train a sparse model from scratch. You can reproduce the results if following the process of generating training data described in the paper.

We also aim to release the training data generated by us, but not sure whether this comply with the licenses of all used datasets and it's still under review.

@contrebande-labs
Copy link
Author

Hi @zhichao-aws ! Thanks so much. We are evaluating and benchmarking many sparse and late interaction models now on our own data and we will look into it in the next couple of days. Please leave this issue opened until the data is published or there are new models trained on open data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants