Skip to content

silvuple/Machine-Learning-Titanic-Kaggle

Repository files navigation

Machine-Learning-Titanic-Kaggle

Predict survival on Titanic and get familiar with ML basics

link to competition page

basic requirements:

  • python 3
  • pandas
  • sklearn

additional libraries/moduls may be required such as:

  • matplotlib
  • numpy

Each script file utilizes on of the SciKit Learn (sklearn) classiriers:

  1. RandomForestClassifier
  2. LogisticRegression
  3. KNeighborsClassifier
  4. SVC (Support Vector Classification)
  5. DecisionTreeClassifier

Different sklearn techniques used throughout the scripts:

  • RFE/RFECV (Feature ranking with recursive feature elimination and cross-validated selection of the best number of features)
  • GridSearchCV (Exhaustive search over specified parameter values for an estimator)
  • LabelEncoder (Encode labels with value, transform non-numerical labels to numerical labels)
  • cross_val_score (Evaluate classifier score by cross-validation to eliminate overfitting)

Future tasks:

  • Work on Feature engineering
  • Try data normalization, try OneHotEncoder
  • Work on feature selection techniques

Current mean accuracy score ~0.77

About

Predicting survival on the Titanic using dataset from Kaggle competition. Features analysis with Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published