在看numpy 和pandas的时候,发现了一个有趣的现象。由于std函数的默认参数不同所以求出来的standard deviation 是不同的。
看代码
-
In [1]: import numpy as np
-
-
In [2]: import pandas as pd
-
-
In [3]: df5=pd.DataFrame(np.arange(9).reshape(3,3), columns=['a','b','c'])
-
-
In [4]: df5
-
Out[4]:
-
a b c
-
0 0 1 2
-
1 3 4 5
-
2 6 7 8
-
-
In [5]: df5.apply(np.std, axis=1)
-
Out[5]:
-
0 0.816497
-
1 0.816497
-
2 0.816497
-
dtype: float64
-
-
In [6]: df5.std(axis=1)
-
Out[6]:
-
0 1.0
-
1 1.0
-
2 1.0
-
dtype: float64
为什么使用np.std 和pandas的std函数,得出的结果却不一样呢?
help一下。在numpy中ddof默认值是0.
-
std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)
-
Compute the standard deviation along the specified axis.
-
-
Returns the standard deviation, a measure of the spread of a distribution,
-
of the array elements. The standard deviation is computed for the
-
flattened array by default, otherwise over the specified axis.
-
-
Parameters
-
----------
-
a : array_like
-
Calculate the standard deviation of these values.
-
axis : None or int or tuple of ints, optional
-
Axis or axes along which the standard deviation is computed. The
-
default is to compute the standard deviation of the flattened array.
-
-
.. versionadded:: 1.7.0
-
-
If this is a tuple of ints, a standard deviation is performed over
-
multiple axes, instead of a single axis or all the axes as before.
-
dtype : dtype, optional
-
Type to use in computing the standard deviation. For arrays of
-
integer type the default is float64, for arrays of float types it is
-
the same as the array type.
-
out : ndarray, optional
-
Alternative output array in which to place the result. It must have
-
the same shape as the expected output but the type (of the calculated
-
values) will be cast if necessary.
-
ddof : int, optional
-
Means Delta Degrees of Freedom. The divisor used in calculations
-
is ``N - ddof``, where ``N`` represents the number of elements.
-
By default `ddof` is zero.
再看pandas中的help信息。ddof的默认值是1。
-
Help on method std in module pandas.core.frame:
-
-
std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs) method of pandas.core.frame.DataFrame instance
-
Return sample standard deviation over requested axis.
-
-
Normalized by N-1 by default. This can be changed using the ddof argument
-
-
Parameters
-
----------
-
axis : {index (0), columns (1)}
-
skipna : boolean, default True
-
Exclude NA/null values. If an entire row/column is NA, the result
-
will be NA
-
level : int or level name, default None
-
If the axis is a MultiIndex (hierarchical), count along a
-
particular level, collapsing into a Series
-
ddof : int, default 1
-
Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
-
where N represents the number of elements.
ddof: Delta Degrees of Freedom.
Degrees of Freedom 是自由度,Delta Degrees of Freedom是什么我是搞不清楚.
但是找到了原因,在调用np.std传入ddof的值就可以了。
-
In [12]: df5.apply(np.std, axis=1, ddof=1)
-
Out[12]:
-
0 1.0
-
1 1.0
-
2 1.0
-
dtype: float64
-
-
In [13]: df5.std(axis=1, ddof=1)
-
Out[13]:
-
0 1.0
-
1 1.0
-
2 1.0
-
dtype: float64
阅读(5446) | 评论(0) | 转发(0) |