Spectral Clustering

Project for CSCI 4971 Large Scale Matrix Computation and Machine Learning exploring spectral clustering algorithms on very large data sets. Specifically, I examine the algorithm presented in "Time and Space Efficient Spectral Clustering via Column Sampling" by Li et al.

Get started by running make data to download the data sets used in this experiment.

You can then run the benchmarking python script, src\benchmark.py:

$ python src\benchmark.py -h
usage: Benchmark spectral clustering algorithms [-h]
                                                [--subset [SUBSET [SUBSET ...]]]
                                                [--iterations ITERATIONS]
                                                [--columns COLUMNS]
                                                [--gamma GAMMA]
                                                d a [a ...]

positional arguments:
  d                     data set to use in benchmarking
  a                     algorithms to run

optional arguments:
  -h, --help            show this help message and exit
  --subset [SUBSET [SUBSET ...]], -s [SUBSET [SUBSET ...]]
                        use only a subset of classes from the data set
  --iterations ITERATIONS, -i ITERATIONS
                        number of iterations to average over
  --columns COLUMNS, -m COLUMNS
                        number of columns to sample
  --gamma GAMMA, -g GAMMA
                        KASP data reduction ratio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Spectral Clustering

Files

README.md

Latest commit

History

README.md

File metadata and controls

Spectral Clustering