Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1782083
  • 博文数量: 297
  • 博客积分: 285
  • 博客等级: 二等列兵
  • 技术积分: 3006
  • 用 户 组: 普通用户
  • 注册时间: 2010-03-06 22:04
个人简介

Linuxer, ex IBMer. GNU https://hmchzb19.github.io/

文章分类

全部博文(297)

文章存档

2020年(11)

2019年(15)

2018年(43)

2017年(79)

2016年(79)

2015年(58)

2014年(1)

2013年(8)

2012年(3)

分类: LINUX

2019-02-19 11:31:44

在看numpy 和pandas的时候,发现了一个有趣的现象。由于std函数的默认参数不同所以求出来的standard deviation 是不同的。
看代码

点击(此处)折叠或打开

  1. In [1]: import numpy as np

  2. In [2]: import pandas as pd

  3. In [3]: df5=pd.DataFrame(np.arange(9).reshape(3,3), columns=['a','b','c'])

  4. In [4]: df5
  5. Out[4]:
  6.    a b c
  7. 0 0 1 2
  8. 1 3 4 5
  9. 2 6 7 8

  10. In [5]: df5.apply(np.std, axis=1)
  11. Out[5]:
  12. 0 0.816497
  13. 1 0.816497
  14. 2 0.816497
  15. dtype: float64

  16. In [6]: df5.std(axis=1)
  17. Out[6]:
  18. 0 1.0
  19. 1 1.0
  20. 2 1.0
  21. dtype: float64

为什么使用np.std 和pandas的std函数,得出的结果却不一样呢?
help一下。在numpy中ddof默认值是0.

点击(此处)折叠或打开

  1. std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)
  2.     Compute the standard deviation along the specified axis.
  3.     
  4.     Returns the standard deviation, a measure of the spread of a distribution,
  5.     of the array elements. The standard deviation is computed for the
  6.     flattened array by default, otherwise over the specified axis.
  7.     
  8.     Parameters
  9.     ----------
  10.     a : array_like
  11.         Calculate the standard deviation of these values.
  12.     axis : None or int or tuple of ints, optional
  13.         Axis or axes along which the standard deviation is computed. The
  14.         default is to compute the standard deviation of the flattened array.
  15.     
  16.         .. versionadded:: 1.7.0
  17.     
  18.         If this is a tuple of ints, a standard deviation is performed over
  19.         multiple axes, instead of a single axis or all the axes as before.
  20.     dtype : dtype, optional
  21.         Type to use in computing the standard deviation. For arrays of
  22.         integer type the default is float64, for arrays of float types it is
  23.         the same as the array type.
  24.     out : ndarray, optional
  25.         Alternative output array in which to place the result. It must have
  26.         the same shape as the expected output but the type (of the calculated
  27.         values) will be cast if necessary.
  28.     ddof : int, optional
  29.         Means Delta Degrees of Freedom. The divisor used in calculations
  30.         is ``N - ddof``, where ``N`` represents the number of elements.
  31.         By default `ddof` is zero.
再看pandas中的help信息。ddof的默认值是1。

点击(此处)折叠或打开

  1. Help on method std in module pandas.core.frame:

  2. std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs) method of pandas.core.frame.DataFrame instance
  3.     Return sample standard deviation over requested axis.
  4.     
  5.     Normalized by N-1 by default. This can be changed using the ddof argument
  6.     
  7.     Parameters
  8.     ----------
  9.     axis : {index (0), columns (1)}
  10.     skipna : boolean, default True
  11.         Exclude NA/null values. If an entire row/column is NA, the result
  12.         will be NA
  13.     level : int or level name, default None
  14.         If the axis is a MultiIndex (hierarchical), count along a
  15.         particular level, collapsing into a Series
  16.     ddof : int, default 1
  17.         Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
  18.         where N represents the number of elements.

ddof: Delta Degrees of Freedom.
Degrees of Freedom 是自由度,Delta Degrees of Freedom是什么我是搞不清楚.


但是找到了原因,在调用np.std传入ddof的值就可以了。


点击(此处)折叠或打开

  1. In [12]: df5.apply(np.std, axis=1, ddof=1)
  2. Out[12]:
  3. 0 1.0
  4. 1 1.0
  5. 2 1.0
  6. dtype: float64

  7. In [13]: df5.std(axis=1, ddof=1)
  8. Out[13]:
  9. 0 1.0
  10. 1 1.0
  11. 2 1.0
  12. dtype: float64


阅读(5446) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~