You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
many thanks for releasing the code and models for your MosaicBERT! I highly appreciate the effort that you put in modernizing the BERT architecture.
I am interested in pretraining MosaicBERT so I have some questions :)
I am interested in the pretraining configuration for the model with 512 sequence length. Additionally: do you have hardware recommendations and the approx. time to pretrain MosaicBERT with 512 seq. length. Did you use the phase 1 + phase 2 "trick" with pretraining for 128 seq. length and then fewer steps with 512? For that, the MosaicBERT with 128 seq. length could be "recycled".
I'm also interested in what implementation is recommended to use e.g. a tagged/specific commit or the upcoming Modernize MosaicBERT #440 PR.