Location-finder / Twitter Geocoder Project

Introduction

Classifier that uses training data from https://data.world/crowdflower/gender-classifier-data to categorize tweets to specific locations

This project aims to classify tweets using solely the content of the tweets (discounting information about the users involved). The project uses a RNN with an underlying multilayer bidirectional LSTM taking advantage of 100 dimensional GloVe embeddings. Primarily implemented using Torch and TensorFlow but also uses additional libraries for small implementation details

Dependencies

numpy==1.20.1
dill==0.3.3
matplotlib==3.3.4
torchtext==0.9.1
torch==1.8.1

Note: GloVe Embeddings will also need to be downloaded although they should automatically upon first run of the code

Code Structure

All necessary tools that would be required by a user are available though the main directory while each subdirectory contains helper functions. Almost all of the neccessary tuning can be done through the config file and all important functions can be called through implement.py.

The classifier subdirectory contains all of the machine learning implementation including the dataloader, training and prediction/testing integrations.
The dataset_dump subdirectory contains any extra data generated by the preprocessing or saved by the machine learning model
The raw_data subdirectory contains any data provided by us users prior to any preprocessing
The tweet_preprocessor subdirectory is an open source library that we use to help us in tokenization
The preprocess subdirectory includes all of our implementations of text preprocessing

Usage

All necessary functions may be called using implement.py. The additional argument parameter determines what is run.

generate - generates preprocessed data from raw csv file
train - causes the model to begin training on the given dataset. Provides metric feedbacks through iostream and generates plot. Please close the plot to terminate the program. This can sometimes take several minutes
test - tests the data's prediction given the contents of 'test_input.txt'. Always train before testing

You can modify test_input.txt to see the assosciated classification for every line in the file.

Examples of usage are:
python3 interface.py generate
python3 interface.py train
python3 interface.py test

Note: Granularity management is a pending feature

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
classifier		classifier
dataset_dump		dataset_dump
preprocess		preprocess
raw_data		raw_data
tweet_preprocessor		tweet_preprocessor
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
Tweet Location Classifier.png		Tweet Location Classifier.png
config.json		config.json
interface.py		interface.py
requirements.txt		requirements.txt
test_input.txt		test_input.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Location-finder / Twitter Geocoder Project

Introduction

Dependencies

Code Structure

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

sunnstix/location-finder

Folders and files

Latest commit

History

Repository files navigation

Location-finder / Twitter Geocoder Project

Introduction

Dependencies

Code Structure

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages