Based on scikit-learn and Python 3. Only binary classification is supported in this version.
-
You need some articles data to train and classify to be stored at
data
directory. Classifier assumes that you have 3 subdirectories:data/classifier_train_data/<class_name>/<article>.txt # ML algorithm training data data/classifier_test_data/<class_name>/<article>.txt # Classified test data data/classifier_x_val_data/<class_name>/<article>.txt # All available articles data for classifier cross-validation
-
Install NumPy and SciPy. It's highly recommended by NumPy authors to install binaries system-wide. So, we won't use virtualenv.
sudo apt-get install python3-numpy python3-scipy
-
Install required Python packages
pip3 install -r requirements.txt
-
Running
python3 text_classifier.py