K-means clustering
1.
#K-Means clustering is unsupervised learning. means split data into K groups,
algorithm for k-means clustering:
1. Randomly pick K centroids(k-means)
2. Assign each data point to the centroid it is closest to.
3. Recompute the centroids based on the average position of each centroid's points.
4. Iterate until points stop changing assignment to centroids.
5. Predict the cluster for new points.
The limitation of K-means clustering:
1. Choosing K: choose the right value of K, the principal way of choosing k is just start low and keep increasing the value of K depending on how many groups you want.
2. Avoiding local minima
3. Labeling the clusters.
2. 实际例子,可以使用sklearn里面的KMeans函数。
这个例子制造了一些人收入和年龄的数字,然后进行cluster.
-
In [1]: from numpy import random , array
-
-
In [2]: def createClusteredData(N, k):
-
...: random.seed(10)
-
...: pointsPerCluster = float(N) / k
-
...: x = []
-
...: for i in range(k):
-
...: incomeCentroid = random.uniform(20000.0, 200000.0)
-
...: ageCentroid = random.uniform(20.0, 70.0)
-
...: for j in range(int(pointsPerCluster)):
-
...: x.append([random.normal(incomeCentroid, 10000.0), random.nor
-
...: mal(ageCentroid, 2.0)])
-
...: x = array(x)
-
...: return x
-
...:
-
-
In [3]: from sklearn.cluster import KMeans
-
-
In [4]: import matplotlib.pyplot as plt
-
-
In [5]: from sklearn.preprocessing import scale
-
-
In [6]: from numpy import random, float
-
-
In [7]: data = createClusteredData(100, 5)
-
-
In [8]: model = KMeans(n_clusters=5)
-
-
In [9]: model = model.fit(scale(data))
-
-
In [10]: print(model.labels_)
-
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
-
1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3
-
3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
-
-
In [11]: plt.figure(figsize=(8, 6))
-
Out[11]: <matplotlib.figure.Figure at 0x7f6377123a58>
-
-
In [13]: plt.scatter(data[:, 0], data[:, 1], c=model.labels_.astype(float))
-
Out[13]: <matplotlib.collections.PathCollection at 0x7f6375a1a908>
-
-
In [14]: plt.show()
阅读(1510) | 评论(0) | 转发(0) |