Skip to content

brunogaldos/unsupervised-ML_Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Unsupervised Clustering Algorithms

This repository contains implementations of three unsupervised clustering algorithms: DBSCAN (Density-Based Spatial Clustering of Applications with Noise), K-Means, and K-Means EM (Expectation-Maximization). These algorithms are used to group similar data points together without any prior knowledge of the labels or categories.

Table of Contents

Introduction

Clustering is a fundamental task in unsupervised machine learning, where the goal is to group similar objects in such a way that objects in the same group (cluster) are more similar to each other than to those in other groups. This repository provides implementations of three clustering techniques:

  1. DBSCAN: A density-based clustering algorithm that can find arbitrarily shaped clusters and is robust to outliers.
  2. K-Means: A popular partitioning method that divides data into k clusters by minimizing the variance within each cluster.
  3. K-Means EM: An extension of K-Means that uses the Expectation-Maximization (EM) algorithm to estimate the parameters of a mixture of Gaussian distributions.

Algorithms

DBSCAN

  • Description: DBSCAN is a clustering algorithm that groups together points that are closely packed together, marking points that are far away as outliers. It works well with clusters of varying shapes and sizes.
  • Parameters:
    • eps: The maximum distance between two points to be considered as neighbors.
    • min_samples: The minimum number of points required to form a dense region (i.e., a cluster).
  • Advantages:
    • Can find arbitrarily shaped clusters.
    • Does not require specifying the number of clusters.
    • Robust to noise and outliers.
  • Disadvantages:
    • Not suitable for datasets with varying density.
    • Performance depends on the choice of eps and min_samples.

K-Means

  • Description: K-Means is a centroid-based algorithm that partitions the dataset into k clusters, where each data point belongs to the cluster with the nearest mean (centroid).
  • Parameters:
    • k: The number of clusters to form.
    • max_iter: Maximum number of iterations for the algorithm.
    • tol: Tolerance to declare convergence.
  • Advantages:
    • Simple and fast.
    • Works well for spherical clusters.
  • Disadvantages:
    • Requires the number of clusters (k) to be specified.
    • Sensitive to the initial placement of centroids.
    • Assumes clusters are spherical and evenly sized.

K-Means EM

  • Description: K-Means EM is a variant of K-Means that incorporates the Expectation-Maximization algorithm. It models the data as a mixture of Gaussian distributions and iteratively refines the cluster assignments and parameters.
  • Parameters:
    • k: The number of clusters to form.
    • max_iter: Maximum number of iterations for the EM algorithm.
    • tol: Tolerance to declare convergence.
  • Advantages:
    • Handles overlapping clusters.
    • Provides a probabilistic cluster membership.
  • Disadvantages:
    • Requires the number of clusters (k) to be specified.
    • Computationally more expensive than standard K-Means.

Installation

To use the clustering algorithms, clone the repository and install the required dependencies:

git clone https://github.com/yourusername/unsupervised-ML_Clustering.git
cd clustering-algorithms
pip install -r requirements.txt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published