Skip to content

Latest commit

 

History

History
12 lines (11 loc) · 449 Bytes

README.md

File metadata and controls

12 lines (11 loc) · 449 Bytes

NLP on QA Forum

Prediction of the best answers of the questions using a scraped QA forum data, including the texts.
*Since the data is too large (283MB) to upload on the git, there is only the codes.

Data Preparation

  • Time data treatment
  • Label encoding
  • Embedding using Word2Vec
  • Down / Upsampling for the umbalanced target variable

Modelling

  • Xgboost: ROC-AUC 84%
  • Neural Network (linear stack of layers): Accuracy 92%, MSE 0.08