The models are not made public due to the double-blind reviewing process. For the final version, the repository will have the links to these models on huggingface.
In 'BERTPreTraining.py' the data path of a directory which contains the documents in .parquet format is expected. This code can be changed to suit ones requirements.
In all other documents the data path of a single csv file with the respective data is expected.
The python environment can be constructed using the requirments.txt file