The purpose of this project is to conduct an analysis for Accountability Accounting clients who are preparing to enter the cryptocurrency market. This analysis consists of creating a report that includes which cryptocurrencies are in the trading market and how they can be grouped in order to create a classification system for this new investment. For this, we use unsupervised learning.
We opened the crypto_data.csv file that was retrieved from the CryptoCompare website. Reading the file showed 6 columns and 1252 lines.
We filtered the crypto_df DataFrame so that it only had lines where the cryptocurrencies that are being traded, with non-null values and where the coins were mined. The result was a table containing 532 rows.
In this step we also use the get_dummies() method to create variables for the two text features, Algorithm and ProofType, and use the StandardScaler fit_transform() function to standardize the features.
We use the Principal Component Analysis (PCA) algorithm to reduce the dimensions of the DataFrame to three principal components, PC 1, PC 2, and PC 3.
Using the K-means algorithm, we created an elbow curve using hvPlot to find the best value for K.
Then, we ran the K-means algorithm to predict the K=4 clusters for the cryptocurrencies' data. We create a new dataframe and name this new column "class".
In this last step, we use scatterplots with Plotly Express to visualize the distinct groups that correspond to the three main components we created in Part 2.
So we created a table of all currently tradable cryptocurrencies using hvplot. table() function, where we arrive at a total of 532 tradable cryptocurrencies.
Finally, I created a hvplot scatterplot with x="TotalCoinsMined", y="TotalCoinSupply" and by="Class" and showed the CoinName when hovering over the data point.
All tables and graphs with commented code are available in the crypto_clustering.ipynb file.