Skip to content

Project for CS 685 to detect if StackOverflow Question will be closed

Notifications You must be signed in to change notification settings

AlolikaGon/CS685SOQuestionClosed

 
 

Repository files navigation

CS685SOQuestionClosed

We explore the task of predicting if a StackOverflow question is likely to be closed using information that is only available at time of submission, i.e: the title of the question, the body of the question and the question tags. Our project addresses this as a multi-class classification task on set of imbalanced labels. The labels are the reasons for question closure.

Dataset

Dataset Link Use the file so_dataset_cleaned.csv the title,body and tags can be tokenized by \t

Analysis data: 50 data-points taken from test set for qualitaive analysis.

Colab Notebook:

Colab ML

Colab BERT

Colab BERT with losses and data augmentation

Word Vectors

ELMo Files: download and unzip elmo_vectors.zip

SO Word2Vec: Alternative? w2v embeddings for software engineering domain, file is very large ~1.5GB

About

Project for CS 685 to detect if StackOverflow Question will be closed

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%