This work was done as a part of CS685.
Author: Daivik Swarup
Download data from here
Split data into train, test, val splits:
python preprocess.py
For classification, create thresholded text files:
python preprocess_threshold <PATH-TO-TRAIN-DIR> train_80_20.txt
python preprocess_threshold <PATH-TO-VAL-DIR> val_80_20.txt
python preprocess_threshold <PATH-TO-TEST-DIR> test_80_20.txt
python binary_classification.py <VECTORIZER> output.pkl
can be one of {'tfidf', 'count', 'tfidf_length', 'count_length', 'bert'}
For lstm:
python train_lstm.py
python train_ranknet.py <VECTORIZER> model.pt
can be one of {'tfidf', 'count', 'tfidf_length', 'count_length', 'bert'}
For lstm:
python train_ranknet_lstm.py
Scripts in the misc directory are self explanatory.