Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1789900
  • 博文数量: 297
  • 博客积分: 285
  • 博客等级: 二等列兵
  • 技术积分: 3006
  • 用 户 组: 普通用户
  • 注册时间: 2010-03-06 22:04
个人简介

Linuxer, ex IBMer. GNU https://hmchzb19.github.io/

文章分类

全部博文(297)

文章存档

2020年(11)

2019年(15)

2018年(43)

2017年(79)

2016年(79)

2015年(58)

2014年(1)

2013年(8)

2012年(3)

分类: Python/Ruby

2018-12-11 09:12:58

K-means clustering

1.
#K-Means clustering is unsupervised learning. means split data into K groups,
algorithm for k-means clustering:
1. Randomly pick K centroids(k-means)
2. Assign each data point to the centroid it is closest to.
3. Recompute the centroids based on the average position of each centroid's points.
4. Iterate until points stop changing assignment to centroids.
5. Predict the cluster for new points.

The limitation of K-means clustering:
1. Choosing K:  choose the right value of K, the principal way of choosing k is just start low and keep increasing the value of K depending on how many groups you want.
2. Avoiding local minima
3. Labeling the clusters.

2. 实际例子,可以使用sklearn里面的KMeans函数。
这个例子制造了一些人收入和年龄的数字,然后进行cluster.

点击(此处)折叠或打开

  1. In [1]: from numpy import random , array

  2. In [2]: def createClusteredData(N, k):
  3.    ...: random.seed(10)
  4.    ...: pointsPerCluster = float(N) / k
  5.    ...: x = []
  6.    ...: for i in range(k):
  7.    ...: incomeCentroid = random.uniform(20000.0, 200000.0)
  8.    ...: ageCentroid = random.uniform(20.0, 70.0)
  9.    ...: for j in range(int(pointsPerCluster)):
  10.    ...: x.append([random.normal(incomeCentroid, 10000.0), random.nor
  11.    ...: mal(ageCentroid, 2.0)])
  12.    ...: x = array(x)
  13.    ...: return x
  14.    ...:

  15. In [3]: from sklearn.cluster import KMeans

  16. In [4]: import matplotlib.pyplot as plt

  17. In [5]: from sklearn.preprocessing import scale

  18. In [6]: from numpy import random, float

  19. In [7]: data = createClusteredData(100, 5)

  20. In [8]: model = KMeans(n_clusters=5)

  21. In [9]: model = model.fit(scale(data))

  22. In [10]: print(model.labels_)
  23. [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  24.  1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3
  25.  3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]

  26. In [11]: plt.figure(figsize=(8, 6))
  27. Out[11]: <matplotlib.figure.Figure at 0x7f6377123a58>

  28. In [13]: plt.scatter(data[:, 0], data[:, 1], c=model.labels_.astype(float))
  29. Out[13]: <matplotlib.collections.PathCollection at 0x7f6375a1a908>

  30. In [14]: plt.show()


阅读(1505) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~