The notebook walks us through a typical workflow for solving data science competitions at sites like Kaggle
-
Data importation
-
Data visualization and analysis
-
Machine Learning
The workflow indicates general sequence of how each stage may follow the other
-
We will import the dataset with all the libraries that we will work and manipulate the data with
-
We will visualize the pattern in the data and the correlation within
-
We will use the preprossed data to predict the test set with differente models
-
We may want to classify or categorize our samples. We may also want to understand the implications or correlation of different classes with our solution goal.
-
We may also want to determine correlation among features other than survival for subsequent goals and workflow stages. Correlating certain features may help in creating, completing, or correcting features
-
Depending on the choice of model algorithm one may require all features to be converted to numerical equivalent values.
-
Data preparation may also require us to estimate any missing values within a feature
-
We may also analyze the given training dataset for errors or possibly innacurate values within features and try to corrent these values or exclude the samples containing the errors
-
Can we create new features based on an existing feature or a set of features, such that the new feature follows the correlation, conversion, completeness goals.
-
How to select the right visualization plots and charts depending on nature of the data and the solution goals.