This repository contains coursework for the Data Mining course in the MS ABA program at Boston University. Team Members: Shimony Agrawal, Gerardo Bastidas, Alberto Calderon, Benjamin Flavin, Oscar Villarreal Rojas
The project aims to analyse the Airbnb Listings for Copacabana, Brazil to better improve its performance. There are 4 key parts of the project:
- Data Exploration and Preparation
- Prediction
- Classification
- Clustering
Based on these steps, supervised and unsupervised machine learning algorithms like Multiple Linear Regression, K-Nearest Neighbours, Naive Bayes, CART and Clustering Analysis were applied to predict prices, instant bookability of the rental, cancellation policies, impact of cleaning fee on the bookings and various clusters the rentals belonged to.
We first performed data wrangling on 33,715 records to eliminate N/A and missing values to perform further analysis on the data.Following which, we performed data visualization to identify any outliers in the data. Using the training set, we created machine learning models in RStudio. We built 5 models: Multiple Linear Regression for price prediction, K-Nearest Neighbours for predicting cancellation policy, Naive Bayes to predict the instant bookability of the rental, Classification and Regression Tree to assess the cleaning fee and lastly, we performed feature engineering to cluster our rentals.The results and analysis can be used by Airbnb to further improve its listings.