Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1789871
  • 博文数量: 297
  • 博客积分: 285
  • 博客等级: 二等列兵
  • 技术积分: 3006
  • 用 户 组: 普通用户
  • 注册时间: 2010-03-06 22:04
个人简介

Linuxer, ex IBMer. GNU https://hmchzb19.github.io/

文章分类

全部博文(297)

文章存档

2020年(11)

2019年(15)

2018年(43)

2017年(79)

2016年(79)

2015年(58)

2014年(1)

2013年(8)

2012年(3)

分类: Python/Ruby

2018-05-23 10:49:53

0. 代码都是在ipython3里面敲的,所以prereq如下:

点击(此处)折叠或打开

  1. ipython3

  2. In [1]: import numpy as np

  3. In [2]: import matplotlib.pyplot as plt

今天是covariance and correlation.

1. they give us a means of measuring just how tight these things are correlated,
covariance: Measures how two variables vary in tandem from their means.
correlation: -1 negative(inverse) correlation,one value increases, the other decreases. vice versa. 0 no correlation, 1 positive correlation. these two attributes are moving in exactly the same way as you look at different data points.

2. using following methods to calculate the covariance and correlation, these are self-written methods for calculate covariance and correlation.
1.Think of the data sets for the two variables as high-dimensional vectors.
2.Convert these to vectors of variances from the mean.
3.Take the dot product(cosine of the angle between them)of the two vectors.
4.Divide by the sample size.

点击(此处)折叠或打开

  1. def de_mean(x):
  2.     xmean=np.mean(x)
  3.     return [xi - xmean for xi in x]

  4. def covariance(x, y):
  5.     n=len(x)
  6.     return np.dot(de_mean(x), de_mean(y)) / (n-1)

点击(此处)折叠或打开

  1. #compute correlation
  2. def correlation(x, y):
  3.     #compute the standard deviation
  4.     stddevx=np.std(x)
  5.     stddevy=np.std(y)
  6.     
  7.     #check devide by 0 in this step
  8.     return covariance(x,y) / stddevx /stddevy


点击(此处)折叠或打开

  1. #Fabricate data:page speeds(how quickly a page renders on a website) and how much people spend.
  2. pagespeeds = np.random.normal(3.0, 1.0, 1000)
  3. #normal distribution : there is no real relationship between the two attributes
  4. purchaseAmount = np.random.normal(50.0, 10.0, 1000)
  5. plt.scatter(pagespeeds, purchaseAmount)
  6. covariance(pagespeeds, purchaseAmount)
  7. np.cov(pagespeeds,purchaseAmount)
  8. plt.show()

  9. #Fbricate these data with relations.
  10. pagespeeds2 = np.random.normal(3.0, 1.0, 1000)
  11. purchaseAmount2 = np.random.normal(50.0, 10.0, 1000) / pagespeeds2
  12. plt.scatter(pagespeeds2, purchaseAmount2)
  13. covariance(pagespeeds2, purchaseAmount2)
  14. np.cov(pagespeeds2,purchaseAmount2)
  15. plt.show()

  16. #close to 0
  17. correlation(pagespeeds, purchaseAmount)
  18. np.corrcoef(pagespeeds, purchaseAmount)

  19. #close to 1 or -1 ,means they have relationship
  20. correlation(pagespeeds2, purchaseAmount2)
  21. np.corrcoef(pagespeeds2, purchaseAmount2)

  22. #a perfect correlations ,close to -1.
  23. pagespeeds3 = np.random.normal(3.0, 1.0, 1000)
  24. purchaseAmount3 = 100 - pagespeeds3 * 3
  25. plt.scatter(pagespeeds3, purchaseAmount3)
  26. correlation(pagespeeds3, purchaseAmount3)
  27. plt.show()

3. numpy has a numpy.cov function that can compute covariance.
numpy.corrcoef, it returns a matrix of correlation coefficients between every combination of the arrays passed in.


阅读(1024) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~