News Classifier

Extract a database (article text, section/tags) from The Guardian API and get vocab to load data
Train an EmbeddingBag classifier with linear output layer (Pytorch) TODO: output test data results to visualise in Tableau for evaluation TODO: make a validation data set to use during training for hyperparameter tuning

If keeping stop tokens: only kept defined list of punctuation, deleted others > Tokenised punctuation so that contraction words were separated into separate tokens (e.g. "weren ' t")
If deleting stop tokens: also delete all punctuation tokens
Delete any words that have vocab count of < 1 when processing train and test data
Replace words in test data unseen in training data with UNK token
Changed from FFNN to EmbeddingBag model to enhance the feature space, boosting accuracy

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
data		data
data_processing		data_processing
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
get-links.py		get-links.py
model.py		model.py
parse_reponses.py		parse_reponses.py
train_ffnn.py		train_ffnn.py
utils.py		utils.py

Provide feedback