Chinaunix首页 | 论坛 | 博客
  • 博客访问: 4565007
  • 博文数量: 1214
  • 博客积分: 13195
  • 博客等级: 上将
  • 技术积分: 9105
  • 用 户 组: 普通用户
  • 注册时间: 2007-01-19 14:41
个人简介

C++,python,热爱算法和机器学习

文章分类

全部博文(1214)

文章存档

2021年(13)

2020年(49)

2019年(14)

2018年(27)

2017年(69)

2016年(100)

2015年(106)

2014年(240)

2013年(5)

2012年(193)

2011年(155)

2010年(93)

2009年(62)

2008年(51)

2007年(37)

分类: IT职场

2015-09-14 20:36:42

原文地址: />

问题 (Question)

I have a pandas dataframe whose indices look like:

df.index
['a_1', 'b_2', 'c_3', ... ] 

I want to rename these indices to:

['a', 'b', 'c', ... ] 

How do I do this without specifying a dictionary with explicit keys for each index value?
I tried:

df.rename( index = lambda x: x.split( '_' )[0] ) 

but this throws up an error:

AssertionError: New axis must be unique to rename 

我有一个熊猫的指标看起来像帧:

df.index
['a_1', 'b_2', 'c_3', ... ] 

我想重命名这些指标:

['a', 'b', 'c', ... ] 

我该怎么做而不指定为每个索引值显式密钥字典吗?
我试过了:

df.rename( index = lambda x: x.split( '_' )[0] ) 

但这会抛出一个错误:

AssertionError: New axis must be unique to rename 

最佳答案 (Best Answer)

That's the error you'd get if your function produced duplicate index values:

>>> df = pd.DataFrame(np.random.random((4,3)),index="a_1 b_2 c_3 c_4".split())
>>> df
            0         1         2
a_1  0.854839  0.830317  0.046283
b_2  0.433805  0.629118  0.702179
c_3  0.390390  0.374232  0.040998
c_4  0.667013  0.368870  0.637276
>>> df.rename(index=lambda x: x.split("_")[0])
[...]
AssertionError: New axis must be unique to rename 

If you really want that, I'd use a list comp:

>>> df.index = [x.split("_")[0] for x in df.index]
>>> df
          0         1         2
a  0.854839  0.830317  0.046283
b  0.433805  0.629118  0.702179
c  0.390390  0.374232  0.040998
c  0.667013  0.368870  0.637276 

but I'd think about whether that's really the right direction.

那是你会如果你的函数产生重复的索引值误差:

>>> df = pd.DataFrame(np.random.random((4,3)),index="a_1 b_2 c_3 c_4".split())
>>> df
            0         1         2
a_1  0.854839  0.830317  0.046283
b_2  0.433805  0.629118  0.702179
c_3  0.390390  0.374232  0.040998
c_4  0.667013  0.368870  0.637276
>>> df.rename(index=lambda x: x.split("_")[0])
[...]
AssertionError: New axis must be unique to rename 

如果你真的想,我会使用列表比较:

>>> df.index = [x.split("_")[0] for x in df.index]
>>> df
          0         1         2
a  0.854839  0.830317  0.046283
b  0.433805  0.629118  0.702179
c  0.390390  0.374232  0.040998
c  0.667013  0.368870  0.637276 

但我想这是否真的是正确的方向。

答案 (Answer) 2

Perhaps you could get the best of both worlds by using a MultiIndex:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(8).reshape(4,2), index=['a_1', 'b_2', 'c_3', 'c_4'])
print(df)
#      0  1
# a_1  0  1
# b_2  2  3
# c_3  4  5
# c_4  6  7

index = pd.MultiIndex.from_tuples([item.split('_') for item in df.index])
df.index = index
print(df)
#      0  1
# a 1  0  1
# b 2  2  3
# c 3  4  5
#   4  6  7 

This way, you can access things according to first level of the index:

In [30]: df.ix['c']
Out[30]: 
   0  1
3  4  5
4  6  7 

or according to both levels of the index:

In [31]: df.ix[('c','3')]
Out[31]: 
0    4
1    5
Name: (c, 3) 

Moreover, all the DataFrame methods are built to work with DataFrames with MultiIndices, so you lose nothing.

However, if you really want to drop the second level of the index, you could do this:

df.reset_index(level=1, drop=True, inplace=True)
print(df)
#    0  1
# a  0  1
# b  2  3
# c  4  5
# c  6  7 

也许你可以得到两全其美的指标:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(8).reshape(4,2), index=['a_1', 'b_2', 'c_3', 'c_4'])
print(df)
#      0  1
# a_1  0  1
# b_2  2  3
# c_3  4  5
# c_4  6  7

index = pd.MultiIndex.from_tuples([item.split('_') for item in df.index])
df.index = index
print(df)
#      0  1
# a 1  0  1
# b 2  2  3
# c 3  4  5
#   4  6  7 

这种方式,您可以访问的东西,根据指标一级:

In [30]: df.ix['c']
Out[30]: 
   0  1
3  4  5
4  6  7 

或根据指标水平:

In [31]: df.ix[('c','3')]
Out[31]: 
0    4
1    5
Name: (c, 3) 

此外,所有帧的方法是建立在与多目标决策等工作dataframes,你失去什么。

然而,如果你真的想降的指数二级,你可以这样做:

df.reset_index(level=1, drop=True, inplace=True)
print(df)
#    0  1
# a  0  1
# b  2  3
# c  4  5
# c 
阅读(1143) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~