-
Notifications
You must be signed in to change notification settings - Fork 0
S27_MachineLearning_Hard
In the modern world, machine learning is a tool used by pretty much everyone. In essence, machine learning algorithms use statistical/mathematical models to "learn" from data to allow for future inference. In supervised learning, each data point has a known label, usually a class membership label (for classification problems) or a numerical value (for regression problems). Unsupervised learning (such as clustering) uses unlabeled data to find patterns and relationships among features. Given a dataset implement k-means clustering to analyze the data. The data consists of 2 numerical features. Given this dataset and a number of clusters, k, the solution should converge on a clustering of the data points. After clustering, the program should print how many data points are in each cluster.
Example:
kmeans("datasetfilepath.csv",4) <- here, the last argument is the number of clusters, k=4 with output:
cluster1: 20 data points
cluster2: 30 data points
cluster3: 20 data points
cluster4: 50 data points
Note: which cluster has which number of data points isn't important, as long as you end up with two clusters with 20 data points, one with 30, and one with 50. Slight variation is not unusual with random initialization (e.g. 20, 22, 28, 50 with certain initializations instead of 20, 20, 30, 50).