Linear_reg_sklearn-hmchzb19-ChinaUnix博客

Linuxer

首页　| 　博文目录　| 　关于我

hmchzb19

博客访问： 1812986
博文数量： 297
博客积分： 285
博客等级：二等列兵
技术积分： 3006
用户组：普通用户
注册时间： 2010-03-06 22:04

个人简介

Linuxer, ex IBMer. GNU https://hmchzb19.github.io/

文章分类

全部博文（297）

machine_learning（16）
PYthon_Design_Pa（1）
数学（1）
Data Struct（1）
scheme（3）
Container（1）
sqlite3（1）
firefox（4）
Tor（1）
java（30）
生活（2）
测试生涯（1）
互联网（4）
algorithm（4）
ubuntu（4）
安全和kali （35）
windows（5）
cloud_manage（3）
tcp/ip（1）
security（5）
Linux（74）
python（70）
C（9）
postgresql（5）
shell（3）
db2（3）
oracle（3）
Power-VM虚拟化（7）
未分配的博文（0）

文章存档

2020年（11）

2019年（15）

2018年（43）

2017年（79）

2016年（79）

2015年（58）

2014年（1）

2013年（8）

2012年（3）

我的朋友

相关博文

Linear_reg_sklearn

分类：大数据

2020-03-31 11:44:29

这是sklearn的linear regression和前两天的statsmodel不一样，statsmodel如果用 statsmodels.api,不论是fit方法还是predict方法，都需要用sm.add_constant方法增加一列const, 如果使用statsmodels.formula.api则不需要add_constant方法，只需要传入R-style formula string就可以. 使用sklearn的LinearRegression则可以直接fit,predict. 只是要注意传入参数的shape.

点击(此处)折叠或打开

# coding: utf-8
import numpy as np
import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
sns.set()
data=pd.read_csv('data-analysis/python-jupyter/1.01. Simple linear regression.csv')
y=data['GPA']
X=data['SAT']
print(X.shape)
print(y.shape)
reg=LinearRegression()
'''
run reg.fit(X,y)
error message:
ValueError: Expected 2D array, got 1D array instead:
check X type
type(X) == Series,
'''
#reshape
X_matrix=X.values.reshape(-1,1)
print(X_matrix.shape)
reg.fit(X_matrix, y)
'''
reg.score: R-squared
reg.coef_: coefficient / slope
reg.intercept_: intercept
'''
print(reg.score(X_matrix, y))
print(reg.coef_)
print(reg.intercept_)
#make prediction
gen_data=np.linspace(1700,1800, num=10, dtype=int)
new_data=pd.DataFrame(data=gen_data, columns=['SAT'])
reg.predict(new_data)
new_data['Predicted_GPA']=reg.predict(new_data)
print(new_data)

下面是前两天的用statsmodel.api的predict部分

点击(此处)折叠或打开

#predict
gen_data=np.linspace(1700,1800, num=10, dtype=int)
new_data=pd.DataFrame(data=gen_data, columns=['SAT'])
new_x=sm.add_constant(new_data)
predicted_y=results.predict(new_x)
new_x['Predicted_GPA']=predicted_y
#drop const-column
new_x=new_x.drop(['const'], axis=1)
print(new_x)

阅读(11046) | 评论(0) | 转发(0) |

上一篇：dummy_variable

下一篇：Multiple_lenear_reg_sklearn

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6