Skip to content

Spring 2017 Large Scale Matrix Computation and Machine Learning project investigating spectral clustering on big data

Notifications You must be signed in to change notification settings

mpoegel/spectral-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spectral Clustering

Project for CSCI 4971 Large Scale Matrix Computation and Machine Learning exploring spectral clustering algorithms on very large data sets. Specifically, I examine the algorithm presented in "Time and Space Efficient Spectral Clustering via Column Sampling" by Li et al.

Get started by running make data to download the data sets used in this experiment.

You can then run the benchmarking python script, src\benchmark.py:

$ python src\benchmark.py -h
usage: Benchmark spectral clustering algorithms [-h]
                                                [--subset [SUBSET [SUBSET ...]]]
                                                [--iterations ITERATIONS]
                                                [--columns COLUMNS]
                                                [--gamma GAMMA]
                                                d a [a ...]

positional arguments:
  d                     data set to use in benchmarking
  a                     algorithms to run

optional arguments:
  -h, --help            show this help message and exit
  --subset [SUBSET [SUBSET ...]], -s [SUBSET [SUBSET ...]]
                        use only a subset of classes from the data set
  --iterations ITERATIONS, -i ITERATIONS
                        number of iterations to average over
  --columns COLUMNS, -m COLUMNS
                        number of columns to sample
  --gamma GAMMA, -g GAMMA
                        KASP data reduction ratio

About

Spring 2017 Large Scale Matrix Computation and Machine Learning project investigating spectral clustering on big data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published