Implementation of K Means Algorithm in Python
pandas
numpy
matplotlib
Takes 3 arguements length of sample data number_of_datapoints
, number of clusters number_of_clusters
and number of iteration to performnumber_of_iterations
python main.py number_of_datapoints number_of_clusters number_of_iterations
File Contains kmeans function def KMeans(data,k,iters)
and sample usage
data=np.random.rand(100,2)
labels,centroids=KMeans(data,4,20)
- Folder:
code-at-1-hour-mark
-> contains the code at 1 hour mark - File:
simplekmeans.py
-> contains function that fits data on specified clusters and iterations - File:
main.py
-> use this file to run program , specify the number of datapoints to randomly generate, number of clusters and number of iterations - File:
algorithm/mlearn.py
-> contains KMeans class used by main.py - Folder:
output
-> contains output of main.py ie plots,data files
- Generate data points
- Calculate kmeans
- select k random points from data
- repeat until iterations
- initialize centroids with points selected in above step
- assigning data points to centroids
- re initialize centroids with mean of selected data points
- return labels,centroids
- Make corresponding plots and save data to output dir