Skip to content

Using Shrinkage methods, tree-based methods, SVMs and NN for a binary classification task

Notifications You must be signed in to change notification settings

manuelrech/predicting-survivals-with-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predicting-survivals-with-ML

Overview

This project is a data science exploration of the Spaceship Titanic dataset, a variant. Make reference to the html file of the famous Titanic competition on Kaggle. The goal of this project is to predict the survival of passengers in a spaceship that is about to collide with a 'spacetime anomaly' using advanced machine learning techniques. The project report can be found in the html file or the Rmd file.

Features engineering

The original features were:

alt text

where transported was the variable to be predicted.

The original dataset included features such as cabin and passenger_id that contain hidden information. For example, by splitting the cabin column every time we encountered a '/', we were able to create 3 new columns. Similarly, by splitting the passenger_id column on the '_' symbol, we were able to create 2 new columns.

Models used

This project is a binary classification task, we tried several machine learning models, including

  • Logistic regression
  • Shrinkage methods (Ridge, Lasso, Elastic net)
  • Tree methods (Single tree, Random Forest, Bagging, Boosting)
  • Support Vector Machines (Linear and non-linear)
  • NN

Results

Our analysis revealed that the best model was Boosting with an impressive Area Under the Curve of 0.8824744.

alt text

Relevant columns + charts

Spa, VRDeck and CryoSleep turned out to be the most relevant features in terms of Mean Decrease in Gini Index for the tree-based methods. Let's look at the charts we did ex-ante to see if this was visually intuible:

alt text

Cryosleep was super clear, however for numerical columns we struggle to see this evidence because there are some extremeli high values that skew the data to the right

alt text

Comments

This was an exciting and enlightening project that allowed us to dive deep into the dataset and uncover hidden insights. We were able to experiment with a variety of different models and understand their parameters and appropriate use cases.

About

Using Shrinkage methods, tree-based methods, SVMs and NN for a binary classification task

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages