Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

which pre-train model should we use for fine-tuning #36

Open
Aniruddha-JU opened this issue Aug 25, 2022 · 2 comments
Open

which pre-train model should we use for fine-tuning #36

Aniruddha-JU opened this issue Aug 25, 2022 · 2 comments

Comments

@Aniruddha-JU
Copy link

I have pre-trained the IndicBART model on new monolingual data, and in the model path two models are saved 1) IndicBART and 2) IndicBART_puremodel. Now which should we use during the fine-tuning?

@Aniruddha-JU
Copy link
Author

IndicBART size is 2.4 GB and pure_model size is 932.

@prajdabre
Copy link
Owner

Either.

Use the pure model with the flag --pretrained_model

Use the larger model with the flag --pretrained_model and an additional flag --no_reload_optimizer_ctr_and_scheduler

The larger checkpoint contains optimizer and scheduler states so you can resume pretraining in case of crash. During fine tuning resetting the optimizer is more common.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants