Python_Data_Science_第11课-hmchzb19-ChinaUnix博客

Linuxer

首页　| 　博文目录　| 　关于我

hmchzb19

博客访问： 1812816
博文数量： 297
博客积分： 285
博客等级：二等列兵
技术积分： 3006
用户组：普通用户
注册时间： 2010-03-06 22:04

个人简介

Linuxer, ex IBMer. GNU https://hmchzb19.github.io/

文章分类

全部博文（297）

machine_learning（16）
PYthon_Design_Pa（1）
数学（1）
Data Struct（1）
scheme（3）
Container（1）
sqlite3（1）
firefox（4）
Tor（1）
java（30）
生活（2）
测试生涯（1）
互联网（4）
algorithm（4）
ubuntu（4）
安全和kali （35）
windows（5）
cloud_manage（3）
tcp/ip（1）
security（5）
Linux（74）
python（70）
C（9）
postgresql（5）
shell（3）
db2（3）
oracle（3）
Power-VM虚拟化（7）
未分配的博文（0）

文章存档

2020年（11）

2019年（15）

2018年（43）

2017年（79）

2016年（79）

2015年（58）

2014年（1）

2013年（8）

2012年（3）

我的朋友

相关博文

Python_Data_Science_第11课

分类： Python/Ruby

2018-12-11 09:12:58

K-means clustering

1.
#K-Means clustering is unsupervised learning. means split data into K groups,
algorithm for k-means clustering:
1. Randomly pick K centroids(k-means)
2. Assign each data point to the centroid it is closest to.
3. Recompute the centroids based on the average position of each centroid's points.
4. Iterate until points stop changing assignment to centroids.
5. Predict the cluster for new points.

The limitation of K-means clustering:
1. Choosing K: choose the right value of K, the principal way of choosing k is just start low and keep increasing the value of K depending on how many groups you want.
2. Avoiding local minima
3. Labeling the clusters.

2. 实际例子，可以使用sklearn里面的KMeans函数。
这个例子制造了一些人收入和年龄的数字，然后进行cluster.

点击(此处)折叠或打开

In [1]: from numpy import random , array
In [2]: def createClusteredData(N, k):
...: random.seed(10)
...: pointsPerCluster = float(N) / k
...: x = []
...: for i in range(k):
...: incomeCentroid = random.uniform(20000.0, 200000.0)
...: ageCentroid = random.uniform(20.0, 70.0)
...: for j in range(int(pointsPerCluster)):
...: x.append([random.normal(incomeCentroid, 10000.0), random.nor
...: mal(ageCentroid, 2.0)])
...: x = array(x)
...: return x
...:
In [3]: from sklearn.cluster import KMeans
In [4]: import matplotlib.pyplot as plt
In [5]: from sklearn.preprocessing import scale
In [6]: from numpy import random, float
In [7]: data = createClusteredData(100, 5)
In [8]: model = KMeans(n_clusters=5)
In [9]: model = model.fit(scale(data))
In [10]: print(model.labels_)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
In [11]: plt.figure(figsize=(8, 6))
Out[11]: <matplotlib.figure.Figure at 0x7f6377123a58>
In [13]: plt.scatter(data[:, 0], data[:, 1], c=model.labels_.astype(float))
Out[13]: <matplotlib.collections.PathCollection at 0x7f6375a1a908>
In [14]: plt.show()

阅读(1525) | 评论(0) | 转发(0) |

上一篇：最近kali linux上碰到的问题和解决方案

下一篇：最近看了些Node.js

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6