dummy_variable-hmchzb19-ChinaUnix博客

Linuxer

首页　| 　博文目录　| 　关于我

hmchzb19

博客访问： 1804532
博文数量： 297
博客积分： 285
博客等级：二等列兵
技术积分： 3006
用户组：普通用户
注册时间： 2010-03-06 22:04

个人简介

Linuxer, ex IBMer. GNU https://hmchzb19.github.io/

文章分类

全部博文（297）

machine_learning（16）
PYthon_Design_Pa（1）
数学（1）
Data Struct（1）
scheme（3）
Container（1）
sqlite3（1）
firefox（4）
Tor（1）
java（30）
生活（2）
测试生涯（1）
互联网（4）
algorithm（4）
ubuntu（4）
安全和kali （35）
windows（5）
cloud_manage（3）
tcp/ip（1）
security（5）
Linux（74）
python（70）
C（9）
postgresql（5）
shell（3）
db2（3）
oracle（3）
Power-VM虚拟化（7）
未分配的博文（0）

文章存档

2020年（11）

2019年（15）

2018年（43）

2017年（79）

2016年（79）

2015年（58）

2014年（1）

2013年（8）

2012年（3）

我的朋友

相关博文

dummy_variable

分类：大数据

2020-03-30 13:58:42

有些时候需要把categorical data转化成number,需要使用dummy variable. 我发现做这个事情的方法太多了，可以用pd.Series.map, 可以用pd.DataFrame.applymap, 也可以用sklearn.preprocessing里面提供的一些Encoder
e.g

点击(此处)折叠或打开

from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder

代码如下

点击(此处)折叠或打开

#import package
import statsmodels.api as sm
import statsmodels.formula.api as smf
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder
sns.set()
data=pd.read_csv('data-analysis/python-jupyter/1.03. Dummies.csv')
# %load 11 12 15 30
print(data.shape)
new_data=data.copy()
#new_data['Attendance']=new_data['Attendance'].map({'Yes':1, 'No':0})
y=new_data['GPA']
X1=new_data[['SAT','Attendance']]
'''
or do this
'''
label_encoder=LabelEncoder()
new_data['Attendance']=label_encoder.fit_transform(new_data['Attendance'])
results=smf.ols(formula='GPA ~ SAT + Attendance', data=new_data).fit()
'''
or do this
x=sm.add_constant(X1)
results=sm.OLS(y,x).fit()
'''
print(results.summary())
# %load 23-28
plt.scatter(new_data['SAT'], y)
yhat_no=0.6439 + 0.0014 * new_data['SAT']
yhat_yes = 0.8665 + 0.0014*new_data['SAT']
fig = plt.plot(new_data['SAT'], yhat_no, lw=2, c='#006837')
fig2= plt.plot(new_data['SAT'], yhat_yes, lw=2, c='#050026')
plt.xlabel('SAT', fontsize=20)
plt.ylabel('GPA', fontsize=20)
plt.show()

阅读(1137) | 评论(0) | 转发(0) |

上一篇：Simple_linear_reg

下一篇：Linear_reg_sklearn

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6