You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
This is not a bug but a question. I'm wondering what's the difference between data_download.sh and create_dataset_from scratch.sh? In README.md the suggested way to download and preprocess data is using create_dataset_from scratch.sh, and doesn't mention the usage of data_donwload.sh.
In my understanding, in spite of downloading Wikipedia, data_donwload.sh will also download BookCorpus for pre-training usage. So what's the reason for not using data_download.sh to prepare data for pre-training.
The text was updated successfully, but these errors were encountered:
Related to Bert/Pytorch
Describe the bug
This is not a bug but a question. I'm wondering what's the difference between
data_download.sh
andcreate_dataset_from scratch.sh
? InREADME.md
the suggested way to download and preprocess data is usingcreate_dataset_from scratch.sh
, and doesn't mention the usage ofdata_donwload.sh
.In my understanding, in spite of downloading Wikipedia,
data_donwload.sh
will also download BookCorpus for pre-training usage. So what's the reason for not usingdata_download.sh
to prepare data for pre-training.The text was updated successfully, but these errors were encountered: