This project implements a machine learning model to predict the survival of passengers aboard the Titanic using the Titanic dataset. The model utilizes various preprocessing techniques and logistic regression to make predictions based on passenger characteristics.
The Titanic dataset is a well-known dataset in data science and machine learning communities. It contains information about passengers, including their class, age, gender, fare, and other attributes. The goal of this project is to predict whether a passenger survived or not based on these features.
- Data Preprocessing: Handling missing values, outlier removal, and feature engineering (e.g., extracting titles from names and calculating family size).
- Machine Learning Pipeline: Utilizes
scikit-learnfor building a robust pipeline that includes data preprocessing and model training. - Logistic Regression: Implements logistic regression for binary classification of survival.
- Model Evaluation: Uses accuracy score and cross-validation to assess model performance.
- Model Serialization: Saves the trained model using
picklefor future predictions.
- Python
- pandas
- NumPy
- scikit-learn
- Matplotlib
- Seaborn
- Flask (for the web form, if applicable)
To run this project locally, follow these steps:
- Clone the repository:
git clone https://github.com/mohitkumhar/titanic_survival_prediction/
- Navigate to the project directory:
cd titanic_survival_prediction - Install the required packages:
pip install -r ./requirements.txt
- Run the model training script to create and save the prediction model:
jupyter notebook model_training_code/main.ipynb
- To use the web input method, run the Flask app:
python app.py
- Open your web browser and navigate to
http://127.0.0.1:5000to access the prediction form.
Contributions are welcome! If you would like to contribute to this project, please fork the repository, create a new branch, and submit a pull request. Any improvements, suggestions, or bug fixes are appreciated!
- The Titanic dataset is provided by Kaggle.
- Special thanks to the data science community for their contributions to tutorials and resources.