Skip to content

apaladugu3/594-Humor

Repository files navigation

Go to the bert repository and download the model that you want to use to generate embeddings there will be an option of 768 or 1024. If you use 768 then you may have to change the embedding dims in CNN.

There are a lot of file paths that are specified in all these files so you should first download and change them to make it easier to work.
You may have missed downloading a vocab file which is also available online so just make sure.
After downloading bert use the filtered input and filtered inputn files to generate bert vectors to do this run the following code

Run python3 extract_features.py   --input_file=filteredinput.txt   --output_file=output.json   --vocab_file=vocab.txt   --bert_config_file=bert_config.json   --init_checkpoint=bert_model.ckpt   --layers=-1   --max_seq_length=128   --batch_size=8

make sure that in the above command layers is always -1. 

Create a new folder as where you want to store data as specified in the paths you chose

Run final_cleansing.py and final_positive.py and specify inputs as if the data is positive or negative and then name of the file with the bert vector embeddings

Remove positiveid, negativeid, vocabcnn and then run vocab_generator.py

Then run train.py to run the CNN module on positive id file and negative id file. The CNN will need details about a vocbular file that is generated by vocab_generator.
train.py uses text_cnn.py and datahelpers.py to run. 

If you want some back tracking capability like convert tensor back to the sentences look at Bert_tokens.py

For any other questions please do contact me at [email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages