rename index of a pandas dataframe-laoliulaoliu-ChinaUnix博客

miraclemiracle.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

laoliulaoliu

博客访问： 4466886
博文数量： 1214
博客积分： 13195
博客等级：上将
技术积分： 9105
用户组：普通用户
注册时间： 2007-01-19 14:41

个人简介

C++,python,热爱算法和机器学习

文章分类

全部博文（1214）

cloud（3）
operation（9）
tornado（4）
mac_os（1）
golang（4）
架构（13）
git（4）
security（29）
shell（1）
macbook（1）
ruby（13）
javascript（15）
design（3）
testing（1）
mac（1）
bigdata（69）
nosql（46）
R（9）
gcj/acm（6）
NLP（10）
小说（3）
matlab（4）
web（44）
java（66）
product（7）
c#（1）
language（4）
machine learning（76）
science（4）
opencourse（2）
windows（3）
search（33）
algorithm（65）
database（51）
compiler（11）
ACE（5）
poem（1）
programming（29）
python（140）
assembly（1）
linux（49）
C++（16）
book（2）
cate（1）
phliosophy（3）
mental（30）
Science fiction（1）
Software（5）
c（23）
network（65）
CS（15）
thinking（10）
BSD（13）
solaris10（2）
life（57）
Debian（16）
economy（7）
Mathematics（57）
OS（8）
ibm（2）
gentoo（32）
未分配的博文（8）

文章存档

2021年（13）

2020年（49）

2019年（14）

2018年（27）

2017年（69）

2016年（100）

2015年（106）

2014年（240）

2013年（5）

2012年（193）

2011年（155）

2010年（93）

2009年（62）

2008年（51）

2007年（37）

我的朋友

问题 (Question)

I have a pandas dataframe whose indices look like:

df.index
['a_1', 'b_2', 'c_3', ... ]

I want to rename these indices to:

['a', 'b', 'c', ... ]

How do I do this without specifying a dictionary with explicit keys for each index value?
I tried:

df.rename( index = lambda x: x.split( '_' )[0] )

but this throws up an error:

AssertionError: New axis must be unique to rename

我有一个熊猫的指标看起来像帧：

df.index
['a_1', 'b_2', 'c_3', ... ]

我想重命名这些指标：

['a', 'b', 'c', ... ]

我该怎么做而不指定为每个索引值显式密钥字典吗？
我试过了：

df.rename( index = lambda x: x.split( '_' )[0] )

但这会抛出一个错误：

AssertionError: New axis must be unique to rename

最佳答案 (Best Answer)

That's the error you'd get if your function produced duplicate index values:

>>> df = pd.DataFrame(np.random.random((4,3)),index="a_1 b_2 c_3 c_4".split())
>>> df
            0         1         2
a_1  0.854839  0.830317  0.046283
b_2  0.433805  0.629118  0.702179
c_3  0.390390  0.374232  0.040998
c_4  0.667013  0.368870  0.637276
>>> df.rename(index=lambda x: x.split("_")[0])
[...]
AssertionError: New axis must be unique to rename

If you really want that, I'd use a list comp:

>>> df.index = [x.split("_")[0] for x in df.index]
>>> df
          0         1         2
a  0.854839  0.830317  0.046283
b  0.433805  0.629118  0.702179
c  0.390390  0.374232  0.040998
c  0.667013  0.368870  0.637276

but I'd think about whether that's really the right direction.

那是你会如果你的函数产生重复的索引值误差：

>>> df = pd.DataFrame(np.random.random((4,3)),index="a_1 b_2 c_3 c_4".split())
>>> df
            0         1         2
a_1  0.854839  0.830317  0.046283
b_2  0.433805  0.629118  0.702179
c_3  0.390390  0.374232  0.040998
c_4  0.667013  0.368870  0.637276
>>> df.rename(index=lambda x: x.split("_")[0])
[...]
AssertionError: New axis must be unique to rename

如果你真的想，我会使用列表比较：

>>> df.index = [x.split("_")[0] for x in df.index]
>>> df
          0         1         2
a  0.854839  0.830317  0.046283
b  0.433805  0.629118  0.702179
c  0.390390  0.374232  0.040998
c  0.667013  0.368870  0.637276

但我想这是否真的是正确的方向。

答案 (Answer) 2

Perhaps you could get the best of both worlds by using a MultiIndex:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(8).reshape(4,2), index=['a_1', 'b_2', 'c_3', 'c_4'])
print(df)
#      0  1
# a_1  0  1
# b_2  2  3
# c_3  4  5
# c_4  6  7

index = pd.MultiIndex.from_tuples([item.split('_') for item in df.index])
df.index = index
print(df)
#      0  1
# a 1  0  1
# b 2  2  3
# c 3  4  5
#   4  6  7

This way, you can access things according to first level of the index:

In [30]: df.ix['c']
Out[30]: 
   0  1
3  4  5
4  6  7

or according to both levels of the index:

In [31]: df.ix[('c','3')]
Out[31]: 
0    4
1    5
Name: (c, 3)

Moreover, all the DataFrame methods are built to work with DataFrames with MultiIndices, so you lose nothing.

However, if you really want to drop the second level of the index, you could do this:

df.reset_index(level=1, drop=True, inplace=True)
print(df)
#    0  1
# a  0  1
# b  2  3
# c  4  5
# c  6  7

也许你可以得到两全其美的指标：

import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(8).reshape(4,2), index=['a_1', 'b_2', 'c_3', 'c_4'])
print(df)
#      0  1
# a_1  0  1
# b_2  2  3
# c_3  4  5
# c_4  6  7

index = pd.MultiIndex.from_tuples([item.split('_') for item in df.index])
df.index = index
print(df)
#      0  1
# a 1  0  1
# b 2  2  3
# c 3  4  5
#   4  6  7

这种方式，您可以访问的东西，根据指标一级：

In [30]: df.ix['c']
Out[30]: 
   0  1
3  4  5
4  6  7

或根据指标水平：

In [31]: df.ix[('c','3')]
Out[31]: 
0    4
1    5
Name: (c, 3)

此外，所有帧的方法是建立在与多目标决策等工作dataframes，你失去什么。

然而，如果你真的想降的指数二级，你可以这样做：

df.reset_index(level=1, drop=True, inplace=True)
print(df)
#    0  1
# a  0  1
# b  2  3
# c  4  5
# c

阅读(1105) | 评论(0) | 转发(0) |

上一篇：开源 Java 中文分词器 Ansj 作者孙健专访

下一篇：ipython notebook 跑在后台

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6