Prediction of the best answers of the questions using a scraped QA forum data, including the texts.
*Since the data is too large (283MB) to upload on the git, there is only the codes.
- Time data treatment
- Label encoding
- Embedding using Word2Vec
- Down / Upsampling for the umbalanced target variable
- Xgboost: ROC-AUC 84%
- Neural Network (linear stack of layers): Accuracy 92%, MSE 0.08