The Power of the Cloud and Unsupervised Learning

Crypto Clustering Overview

In this assignment I run the k-Means algorithm and a Principal Component Analysis (PCA) to cluster cryptocurrencies.
Assuming I am a Senior Manager at the Advisory Services team on a Big Four firm.
One of my most important clients, a prominent investment bank, is interested in offering a new cryptocurrencies investment portfolio for its customers, however, they are lost in the immense universe of cryptocurrencies.
They ask me to help them make sense of it all by generating a report of what cryptocurrencies are available on the trading market and how they can be grouped using classification.
I will put my new unsupervivsed learning and Amazon SageMaker skills into action by clustering cryptocurrencies and creating plots to present my results.
I am asked to accomplish the following main tasks:
- Data Preprocessing: Prepare data for dimension reduction with PCA and clustering using K-Means.
- Reducing Data Dimensions Using PCA: Reduce data dimension using the PCA algorithm from sklearn.
- Clustering Cryptocurrencies Using K-Means: Predict clusters using the cryptocurrencies data using the KMeans algorithm from sklearn.
- Visualizing Results: Create some plots and data tables to present my results.
- Optional Challenge: Deploy my notebook to Amazon SageMaker.

Data Processing

Clustering Cryptocurrencies Using k-means

Create an Elbow Curve to find the best value for k using the pcs_df DataFrame.

inertia = []
k = list(range(1, 11))

# Calculate the inertia for the range of k values
for i in k:
    k_model = KMeans(n_clusters=i, random_state=1)
    k_model.fit(pcs_df)
    inertia.append(k_model.inertia_)

# Create the Elbow Curve using hvPlot
elbow_data = {"k": k, "inertia": inertia}
df_elbow = pd.DataFrame(elbow_data)

# Create Elbow plot
df_elbow.hvplot.line(
    x="k", 
    y="inertia", 
    title="Elbow Curve", 
    xticks=k
)

Once I define the best value for k, run the Kmeans algorithm to predict the k clusters for the cryptocurrencies data. Use the pcs_df to run the KMeans algorithm.

# Initialize the K-Means model
model = KMeans(n_clusters = 10, random_state=0)

# Fit the model
model.fit(pcs_df)

# Predict clusters
k_10 = model.predict(pcs_df)

Create a new DataFrame named clustered_df, that includes the following columns "Algorithm", "ProofType", "TotalCoinsMined", "TotalCoinSupply", "PC 1", "PC 2", "PC 3", "CoinName", "Class". I should maintain the index of the crypto_df DataFrames as is shown bellow.

clustered_df = pd.concat([crypto_df, pcs_df], axis=1)
clustered_df["Class"] = k_10
clustered_df["CoinName"] = coins_name
clustered_df.head(20)

Visualizing Results

In this section, I will create some data visualization to present the final results.
Create a scatter plot using hvplot.scatter, to present the clustered data about cryptocurrencies having x="TotalCoinsMined" and y="TotalCoinSupply" to contrast the number of available coins versus the total number of mined coins. Use the hover_cols=["CoinName"] parameter to include the cryptocurrency name on each data point.

# Plot Scatter plot
clustered_df.hvplot.scatter(
    x= "TotalCoinsMined", 
    y= "CirculatingSupply",
    hover_cols=["CoinName"]
)

Use hvplot.table to create a data table with all the current tradable cryptocurrencies. The table should have the following columns: "CoinName", "Algorithm", "ProofType", "CirculatingSupply", "TotalCoinsMined", "Class"

clustered_df.hvplot.table(columns=["CoinName", "Algorithm", "ProofType", "CirculatingSupply", "TotalCoinsMined", "Class"], sortable=True, selectable=True)

Optional Challenge

For the challenge section, I have to upload my Jupyter notebook to Amazon SageMaker and deploy it.
The hvplot library is not included in the built-in anaconda environments, so for this challenge section, I should use the altair library instead.
Upload my Jupyter notebook and rename it as crypto_clustering_sm.ipynb
Select the conda_python3 environment.
Install the altair library by running the following code before the initial imports.
```
!pip install -U altair
```
Use the altair scatter plot to create the Elbow Curve.

inertia = []
k = list(range(1, 11))

# Calculate the inertia for the range of k values
for i in k:
    k_model = KMeans(n_clusters=i, random_state=1)
    k_model.fit(pcs_df)
    inertia.append(k_model.inertia_)

# Create the Elbow Curve using altair
elbow_data = {"k": k, "inertia": inertia}
df_elbow = pd.DataFrame(elbow_data)

# Create Elbow plot
alt.Chart(df_elbow).mark_line().encode(
    x="k", 
    y="inertia"
)

Use the altair scatter plot to visualize the clusters. Since this is a 2D-Scatter, use x="PC 1" and y="PC 2" for the axes, and add the following columns as tool tips: "CoinName", "Algorithm", "TotalCoinsMined", "TotalCoinSupply".

# Plot the scatter with x="PC 1" and y="PC 2"
# Plot the clusters
alt.Chart(clustered_df).mark_circle(size=60).encode(
    x="PC 1",
    y="PC 2",
    color='Class',
    tooltip=['CoinName', 'Algorithm', 'TotalCoinsMined', 'CirculatingSupply']
).interactive()

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
ClusteringCrypto		ClusteringCrypto
plots		plots
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Power of the Cloud and Unsupervised Learning

Table of Contents

Crypto Clustering Overview

Data Processing

Reducing Data Dimentions Using PCA

Clustering Cryptocurrencies Using k-means

Visualizing Results

Optional Challenge

About

Releases

Packages

Languages

mmsaki/clustering-crypto

Folders and files

Latest commit

History

Repository files navigation

The Power of the Cloud and Unsupervised Learning

Table of Contents

Crypto Clustering Overview

Data Processing

Reducing Data Dimentions Using PCA

Clustering Cryptocurrencies Using k-means

Visualizing Results

Optional Challenge

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages