np和pd中std函数默认参数不同引发的有趣现象-hmchzb19-ChinaUnix博客

Chinaunix首页 | 论坛 | 博客

首页　| 　博文目录　| 　关于我

博客访问： 1812991
博文数量： 297
博客积分： 285
博客等级：二等列兵
技术积分： 3006
用户组：普通用户
注册时间： 2010-03-06 22:04

个人简介

Linuxer, ex IBMer. GNU https://hmchzb19.github.io/

文章分类

全部博文（297）

machine_learning（16）
PYthon_Design_Pa（1）
数学（1）
Data Struct（1）
scheme（3）
Container（1）
sqlite3（1）
firefox（4）
Tor（1）
java（30）
生活（2）
测试生涯（1）
互联网（4）
algorithm（4）
ubuntu（4）
安全和kali （35）
windows（5）
cloud_manage（3）
tcp/ip（1）
security（5）
Linux（74）
python（70）
C（9）
postgresql（5）
shell（3）
db2（3）
oracle（3）
Power-VM虚拟化（7）
未分配的博文（0）

文章存档

2020年（11）

2019年（15）

2018年（43）

2017年（79）

2016年（79）

2015年（58）

2014年（1）

2013年（8）

2012年（3）

我的朋友

最近访客

推荐博文

相关博文

np和pd中std函数默认参数不同引发的有趣现象

分类： LINUX

2019-02-19 11:31:44

在看numpy 和pandas的时候，发现了一个有趣的现象。由于std函数的默认参数不同所以求出来的standard deviation 是不同的。
看代码

点击(此处)折叠或打开

In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: df5=pd.DataFrame(np.arange(9).reshape(3,3), columns=['a','b','c'])
In [4]: df5
Out[4]:
a b c
0 0 1 2
1 3 4 5
2 6 7 8
In [5]: df5.apply(np.std, axis=1)
Out[5]:
0 0.816497
1 0.816497
2 0.816497
dtype: float64
In [6]: df5.std(axis=1)
Out[6]:
0 1.0
1 1.0
2 1.0
dtype: float64

为什么使用np.std 和pandas的std函数，得出的结果却不一样呢？
help一下。在numpy中ddof默认值是0.

点击(此处)折叠或打开

std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)
Compute the standard deviation along the specified axis.
Returns the standard deviation, a measure of the spread of a distribution,
of the array elements. The standard deviation is computed for the
flattened array by default, otherwise over the specified axis.
Parameters
----------
a : array_like
Calculate the standard deviation of these values.
axis : None or int or tuple of ints, optional
Axis or axes along which the standard deviation is computed. The
default is to compute the standard deviation of the flattened array.
.. versionadded:: 1.7.0
If this is a tuple of ints, a standard deviation is performed over
multiple axes, instead of a single axis or all the axes as before.
dtype : dtype, optional
Type to use in computing the standard deviation. For arrays of
integer type the default is float64, for arrays of float types it is
the same as the array type.
out : ndarray, optional
Alternative output array in which to place the result. It must have
the same shape as the expected output but the type (of the calculated
values) will be cast if necessary.
ddof : int, optional
Means Delta Degrees of Freedom. The divisor used in calculations
is ``N - ddof``, where ``N`` represents the number of elements.
By default `ddof` is zero.

再看pandas中的help信息。ddof的默认值是１。

点击(此处)折叠或打开

Help on method std in module pandas.core.frame:
std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs) method of pandas.core.frame.DataFrame instance
Return sample standard deviation over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument
Parameters
----------
axis : {index (0), columns (1)}
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series
ddof : int, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
where N represents the number of elements.

ddof: Delta Degrees of Freedom.
Degrees of Freedom 是自由度,Delta Degrees of Freedom是什么我是搞不清楚.

但是找到了原因，在调用np.std传入ddof的值就可以了。

点击(此处)折叠或打开

In [12]: df5.apply(np.std, axis=1, ddof=1)
Out[12]:
0 1.0
1 1.0
2 1.0
dtype: float64
In [13]: df5.std(axis=1, ddof=1)
Out[13]:
0 1.0
1 1.0
2 1.0
dtype: float64

阅读(5512) | 评论(0) | 转发(0) |

0

上一篇：shell里面打印utf-8字符

下一篇：python使用map来模拟switch

给主人留下些什么吧！~~

关于我们 | 关于IT168 | 联系方式 | 广告合作 | 法律声明 | 免费注册

Copyright 2001-2010 ChinaUnix.net All Rights Reserved 北京皓辰网域网络信息技术有限公司. 版权所有

感谢所有关心和支持过ChinaUnix的朋友们