Skip to content

Latest commit

 

History

History
55 lines (36 loc) · 3.5 KB

README.md

File metadata and controls

55 lines (36 loc) · 3.5 KB

k-meanz

k-means clustering of image data, pixel by pixel.

Implemented in various combinations of:

+ visualize clustering in action!** ( safari-friendly but not firefox/chrome/iceweasel ): )

** HTML rendered from raw IPython notebook

Usage

Clustering with TensorFlow...

$ python k_means_tf.py <path/to/input> [-k K] [-r ROUNDS] [-o OUTDIR] [-s SCALE] [-g GENERATE_ALL] [-d DATA_SAVING]

Clustering with numpy...

$ python k_means_np_vanilla.py <path/to/input> [-k K] [-r ROUNDS] [-o OUTDIR] [-s SCALE] [-g GENERATE_ALL]

positional arguments:

  • path to input image jp(e)g

optional arguments:

  • -h, --help      help message
  • -k, --k      number of centroids (default: 50)
  • -r, --rounds      number of rounds of clustering (default: 5)
  • -o, --outdir      path/to/output/directory (default: .)
  • -s, --scale      scale pixel location to be equitable to (within the same range as) RGB vals? [True/False] (default: T)
  • -g, --generate_all      generate image after each round? (slower) [True/False] (default: F)
  • -d, --data_saving      save clustering data as .txt? (centroids, cluster sizes, dimensions) [True/False] (default: F)

e.g.

$ python k_means_tf.py ~/Downloads/erykah_badu.jpg -k10 -r5 536211-78101_160103_0516_k10_0

$ python k_means_tf.py ~/Downloads/erykah_badu.jpg -k10 -r5 -s False 536211-78101_160119_0002_k10_4

$ python k_means_tf.py ~/Downloads/erykah_badu.jpg -k50 -r5 -s False 536211-78101_160118_1728_k50_4

$ python k_means_tf.py ~/Downloads/erykah_badu.jpg -k1000 -r3 536211-78101_kmeanz_3_k1000

"Go on..."

k-means clustering is a method for data mining with no prior knowledge of data distribution but explicit number of classifications ("clusters"). In each round, pixels are partitioned by identifying the best matching cluster, based on Euclidean distance along 5 dimensions: location (x,y) and color (R,G,B). Centroids are then updated by re-computing cluster averages. In order to generate clustered/segmented images, each pixel color value is assigned based on its corresponding centroid color value.

N.B. k_means_tf.py is the most efficient, but memory-intensive