You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
do you sampled each dataset (Wikipedia, Common Crawl, Subtitles etc.) equally during German-BERT Training?
OpenAI uses a unequal sampling, which may lead to a better result, as stated in the GPT-3 Paper:
Note that during training, datasetsare not sampled in proportion to their size, but rather datasets we view as higher-quality are
sampled more frequently,such that CommonCrawl and Books2 datasets are sampled less than once during training, but the other datasets aresampled 2-3 times. This essentially accepts a small amount of overfitting in exchange for higher quality training data
If yes, which paremeters do you used?
The text was updated successfully, but these errors were encountered:
I didn't use a specific sampling method (so all parts are sampled equally). But I think this could be interesting for future work to e.g. see the effects on downstream tasks :)
Hi,
do you sampled each dataset (Wikipedia, Common Crawl, Subtitles etc.) equally during German-BERT Training?
OpenAI uses a unequal sampling, which may lead to a better result, as stated in the GPT-3 Paper:
If yes, which paremeters do you used?
The text was updated successfully, but these errors were encountered: