A quick program to machine learning KMeans clustering algorithm
This c++ implementation is very "naive" and non optimized solution.
For compilation:
- mkdir build
- cmake <project_directory>
- make
Three files are create with this dataset:
- File "elbow" which contains the inertia value considering the number of cluster evaluate:
Visualization with gnuplot:
set datafile sep ','
plot 'elbow' u 1:2 w l
Elbow is quite "bumpy" due to very naive initialization of the centroid (totally random point is choose for initialize each centroid)
- File "point" from dataset with associated centroid index cluster:
Visualization with gnuplot:
set datafile sep ','
plot 'point' u 2:3:4 palette ps 5 pt 7
- File "centroid" which contents coordinates of the centroid:
Visualization with gnuplot for both data and centroid:
set datafile sep ','
plot 'point' u 2:3:4 palette ps 5 pt 7, 'centroid' u 2:3 ps 5 pt 9
More information on dataset use in this example and dataset process via sklearn here: