Unsupervised Clustering Algorithms

This repository contains implementations of three unsupervised clustering algorithms: DBSCAN (Density-Based Spatial Clustering of Applications with Noise), K-Means, and K-Means EM (Expectation-Maximization). These algorithms are used to group similar data points together without any prior knowledge of the labels or categories.

Introduction

Clustering is a fundamental task in unsupervised machine learning, where the goal is to group similar objects in such a way that objects in the same group (cluster) are more similar to each other than to those in other groups. This repository provides implementations of three clustering techniques:

DBSCAN: A density-based clustering algorithm that can find arbitrarily shaped clusters and is robust to outliers.
K-Means: A popular partitioning method that divides data into k clusters by minimizing the variance within each cluster.
K-Means EM: An extension of K-Means that uses the Expectation-Maximization (EM) algorithm to estimate the parameters of a mixture of Gaussian distributions.

Algorithms

DBSCAN

Description: DBSCAN is a clustering algorithm that groups together points that are closely packed together, marking points that are far away as outliers. It works well with clusters of varying shapes and sizes.
Parameters:
- eps: The maximum distance between two points to be considered as neighbors.
- min_samples: The minimum number of points required to form a dense region (i.e., a cluster).
Advantages:
- Can find arbitrarily shaped clusters.
- Does not require specifying the number of clusters.
- Robust to noise and outliers.
Disadvantages:
- Not suitable for datasets with varying density.
- Performance depends on the choice of eps and min_samples.

K-Means

Description: K-Means is a centroid-based algorithm that partitions the dataset into k clusters, where each data point belongs to the cluster with the nearest mean (centroid).
Parameters:
- k: The number of clusters to form.
- max_iter: Maximum number of iterations for the algorithm.
- tol: Tolerance to declare convergence.
Advantages:
- Simple and fast.
- Works well for spherical clusters.
Disadvantages:
- Requires the number of clusters (k) to be specified.
- Sensitive to the initial placement of centroids.
- Assumes clusters are spherical and evenly sized.

K-Means EM

Description: K-Means EM is a variant of K-Means that incorporates the Expectation-Maximization algorithm. It models the data as a mixture of Gaussian distributions and iteratively refines the cluster assignments and parameters.
Parameters:
- k: The number of clusters to form.
- max_iter: Maximum number of iterations for the EM algorithm.
- tol: Tolerance to declare convergence.
Advantages:
- Handles overlapping clusters.
- Provides a probabilistic cluster membership.
Disadvantages:
- Requires the number of clusters (k) to be specified.
- Computationally more expensive than standard K-Means.

Installation

To use the clustering algorithms, clone the repository and install the required dependencies:

git clone https://github.com/yourusername/unsupervised-ML_Clustering.git
cd clustering-algorithms
pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
DBSCAN_ipBynb.ipynb		DBSCAN_ipBynb.ipynb
K_MEANS.ipynb		K_MEANS.ipynb
Kmeans-EM.ipynb		Kmeans-EM.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Clustering Algorithms

Table of Contents

Introduction

Algorithms

DBSCAN

K-Means

K-Means EM

Installation

About

Releases

Packages

Languages

brunogaldos/unsupervised-ML_Clustering

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Clustering Algorithms

Table of Contents

Introduction

Algorithms

DBSCAN

K-Means

K-Means EM

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages