Edit Friday, June 14th 2019
Our files:
- DataCleaning.ipynb: creating the cleaned dataset, quality assurance, and modifying the data frame. Export the clean data as:
- export the data (long format - transaction table) as cleaned_data.parquet
- export the data (wide format - customer table) as customer_data.csv
-
TDataVisualization.ipynb: creating graphs to show relationships/trends not related to the model. Uses the transaction table.
-
CDataVisualization.ipynb: creating graphs to show relationships/trends not related to the model. Uses the customer table.
-
Kmeans.ipynb: this is the segmentation model that utilizes k-means to find our clusters.
-
MeanShift.ipynb: this is the segmentation model that utilizes mean shift to find our clusters.
-
DBSCAN.ipynb: this is the segmentation model that utilizes DBSCAN to find our clusters.
-
GowerDistance.ipynb: this is the file that calculates gower's distance for the customer table.
-
MeanShift.ipynb: this is the model running with the mean shift algorithm.
-
PCA.ipynb: runs the cleaned customer table through Principal component analysis, reduces the dimensions of the data so it can run smoother.
-
UMAPS.ipynb: runs the data through UMAPS.
Over the course of the project we will add notes about the files of the project and what we are doing.