Using unsupervised machine learning, our goal was to analyze a large dataset of cryptocurrencies and classify them into multiple groups. We visualized the data with both 2D and 3D scatter plots, and finally produced a report that classified the cryptocurrencies according to their features.
Data Source: crypto_data.csv
Software/Code Language(s): Python 3.7.11, Conda 4.11.0, Jupyter Notebook, Visual Studio Code 1.65.0
Using K-Means with the Inertia Method, we identify the number of useful clusters before we start seeing diminishing returns:
Our elbow curve graph shows a sharp dropoff in significance after about 4 clusters, so that's how many we'll use in our K-Means model output.
Using the PCA algorithm to reduce our dataset to 3 primary components allows us to visualize the class clusters in three dimensions. We can see that most of the cryptocurrencies are fairly tightly clustered in classes 0, 2 and 3, with BitTorrent showing as a bit of an outlier in class 1 (detailed pic below).
When we visualize the four classes in two dimensions, plotted by Total Coin Supply vs. Total Coins Mined, we can confirm that BitTorrent is indeed an outlier. We also see a bit of divergent behavior in TurtleCoin, in class 2 (detailed pic below).
A new table was generated, pairing the important features of each cryptocurrency with their corresponding class. The above screenshot shows the first 10 rows of the table.
After cleaning and transforming the data, we've successfully classified 532 different cryptocurrencies into 4 groups based on similarity between features. We've also identified a couple of obvious outliers. Each group will need to be analyzed for metrics that our client will consider favorable (performance/volatility).