-
Notifications
You must be signed in to change notification settings - Fork 0
apaladugu3/594-Humor
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Go to the bert repository and download the model that you want to use to generate embeddings there will be an option of 768 or 1024. If you use 768 then you may have to change the embedding dims in CNN. There are a lot of file paths that are specified in all these files so you should first download and change them to make it easier to work. You may have missed downloading a vocab file which is also available online so just make sure. After downloading bert use the filtered input and filtered inputn files to generate bert vectors to do this run the following code Run python3 extract_features.py --input_file=filteredinput.txt --output_file=output.json --vocab_file=vocab.txt --bert_config_file=bert_config.json --init_checkpoint=bert_model.ckpt --layers=-1 --max_seq_length=128 --batch_size=8 make sure that in the above command layers is always -1. Create a new folder as where you want to store data as specified in the paths you chose Run final_cleansing.py and final_positive.py and specify inputs as if the data is positive or negative and then name of the file with the bert vector embeddings Remove positiveid, negativeid, vocabcnn and then run vocab_generator.py Then run train.py to run the CNN module on positive id file and negative id file. The CNN will need details about a vocbular file that is generated by vocab_generator. train.py uses text_cnn.py and datahelpers.py to run. If you want some back tracking capability like convert tensor back to the sentences look at Bert_tokens.py For any other questions please do contact me at [email protected]
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published