Skip to content

Latest commit

 

History

History
33 lines (28 loc) · 1.48 KB

README.md

File metadata and controls

33 lines (28 loc) · 1.48 KB

Spectral Clustering

Project for CSCI 4971 Large Scale Matrix Computation and Machine Learning exploring spectral clustering algorithms on very large data sets. Specifically, I examine the algorithm presented in "Time and Space Efficient Spectral Clustering via Column Sampling" by Li et al.

Get started by running make data to download the data sets used in this experiment.

You can then run the benchmarking python script, src\benchmark.py:

$ python src\benchmark.py -h
usage: Benchmark spectral clustering algorithms [-h]
                                                [--subset [SUBSET [SUBSET ...]]]
                                                [--iterations ITERATIONS]
                                                [--columns COLUMNS]
                                                [--gamma GAMMA]
                                                d a [a ...]

positional arguments:
  d                     data set to use in benchmarking
  a                     algorithms to run

optional arguments:
  -h, --help            show this help message and exit
  --subset [SUBSET [SUBSET ...]], -s [SUBSET [SUBSET ...]]
                        use only a subset of classes from the data set
  --iterations ITERATIONS, -i ITERATIONS
                        number of iterations to average over
  --columns COLUMNS, -m COLUMNS
                        number of columns to sample
  --gamma GAMMA, -g GAMMA
                        KASP data reduction ratio