Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1782199
  • 博文数量: 297
  • 博客积分: 285
  • 博客等级: 二等列兵
  • 技术积分: 3006
  • 用 户 组: 普通用户
  • 注册时间: 2010-03-06 22:04
个人简介

Linuxer, ex IBMer. GNU https://hmchzb19.github.io/

文章分类

全部博文(297)

文章存档

2020年(11)

2019年(15)

2018年(43)

2017年(79)

2016年(79)

2015年(58)

2014年(1)

2013年(8)

2012年(3)

分类: Python/Ruby

2019-05-10 20:58:35

该例子出自pandas-for-everyone一书.
使用了如下的csv文件.

点击(此处)折叠或打开

  1. In [80]: !cat ./gapminder/other_csv/scientists.csv
  2. Name,Born,Died,Age,Occupation
  3. Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
  4. William Gosset,1876-06-13,1937-10-16,61,Statistician
  5. Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
  6. Marie Curie,1867-11-07,1934-07-04,66,Chemist
  7. Rachel Carson,1907-05-27,1964-04-14,56,Biologist
  8. John Snow,1813-03-15,1858-06-16,45,Physician
  9. Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
  10. Johann Gauss,1777-04-30,1855-02-23,77,Mathematician
读入csv文件,将其中的一列赋值给ages,然后shuffle这一列,会发现ages会跟着变化,所以ages只是指向这一列的指针而已。

点击(此处)折叠或打开

  1. In [81]: import numpy as np

  2. In [82]: import pandas as pd

  3. In [83]: import matplotlib.pyplot as plt

  4. In [84]: scientists=pd.read_csv("./gapminder/other_csv/scientists.csv")

  5. In [85]: scientists.shape
  6. Out[85]: (8, 5)

  7. In [86]: scientists.columns
  8. Out[86]: Index(['Name', 'Born', 'Died', 'Age', 'Occupation'], dtype='object')

  9. In [87]: scientists.dtypes
  10. Out[87]:
  11. Name object
  12. Born object
  13. Died object
  14. Age int64
  15. Occupation object
  16. dtype: object

  17. In [88]: ages=scientists['Age']

  18. In [89]: ages
  19. Out[89]:
  20. 0 37
  21. 1 61
  22. 2 90
  23. 3 66
  24. 4 56
  25. 5 45
  26. 6 41
  27. 7 77
  28. Name: Age, dtype: int64

  29. In [90]: import random

  30. In [91]: random.seed(42)

  31. In [92]: random.shuffle(scientists['Age'])
  32. /usr/lib/python3.7/random.py:278: SettingWithCopyWarning:
  33. A value is trying to be set on a copy of a slice from a DataFrame

  34. See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  35.   x[i], x[j] = x[j], x[i]

然后根据born和died两列创造出新的两列, 类型为datetime64.

点击(此处)折叠或打开

  1. In [93]: born_datetime=pd.to_datetime(scientists['Born'], format="%Y-%m-%d")

  2. In [94]: died_datetime=pd.to_datetime(scientists['Died'], format='%Y-%m-%d')

  3. In [95]: scientists['Born_dt'], scientists['Died_dt']=(born_datetime, died_datetime)

  4. In [96]: scientists.dtypes
  5. Out[96]:
  6. Name object
  7. Born object
  8. Died object
  9. Age int64
  10. Occupation object
  11. Born_dt datetime64[ns]
  12. Died_dt datetime64[ns]
  13. dtype: object

  14. In [97]: scientists.shape
  15. Out[97]: (8, 7)

最后使用得到的两列datetime64的类型做减法,得到timedelta64数据类型,然后将这个类型转化为int.

点击(此处)折叠或打开

  1. #下面两种方法都可以
  2. scientists['age_years_dt']=scientists['age_days_dt'].astype(pd.Timedelta).apply(lambda l: l.days //365)
  3. scientists['age_years_dt']=scientists['age_days_dt'].astype('timedelta64[D]').astype(int) // 365
  4. In [102]: scientists.dtypes
    Out[102]:
    Name                     object
    Born                     object
    Died                     object
    Age                       int64
    Occupation               object
    Born_dt          datetime64[ns]
    Died_dt          datetime64[ns]
    age_days_dt     timedelta64[ns]
    age_years_dt              int64
    dtype: object

    In [103]: scientists['age_years_dt']
    Out[103]:
    0    37
    1    61
    2    90
    3    66
    4    56
    5    45
    6    41
    7    77
    Name: age_years_dt, dtype: int64



a
阅读(2077) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~