Multiple_lenear_reg_sklearn-hmchzb19-ChinaUnix博客

个人简介

Linuxer, ex IBMer. GNU https://hmchzb19.github.io/

文章分类

文章存档

2020年（11）

2019年（15）

2018年（43）

2017年（79）

2016年（79）

2015年（58）

我的朋友

相关博文

Multiple_lenear_reg_sklearn

分类：云计算

2020-04-03 11:12:35

使用sklearn练习的multiple_linear_regression, sklearn没有现成计算p-value,adjusted-R-squared的方法。也没有statsmodel那样的summary，需要自己手动制作.

点击(此处)折叠或打开

点击(此处)折叠或打开

# coding: utf-8
import statsmodels.api as sm
import statsmodels.formula.api as smf
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
sns.set()
data=pd.read_csv('data-analysis/python-jupyter/1.02. Multiple linear regression.csv')
#print(data.describe())
X=data[['SAT','Rand 1,2,3']]
y=data['GPA']
from sklearn.linear_model import LinearRegression
reg=LinearRegression()
reg.fit(X, y)
#compute r-sqaured ,adjusted-R-squared
r2=reg.score(X,y)
n,p=X.shape[0], X.shape[1]
adjusted_r2=1-(1-r2)*(n-1)/(n-p-1)
from sklearn.feature_selection import f_regression
'''
f_regression() =>return 2 array
1st array: F : shape=(n_features,), => F values of features, F-statistic
2nd array: p-value : shape=(n_features,), => p-values of F-scores.
We always want the p-value to be less than 0.05
'''
#get p_values for these 2 features
p_values=f_regression(X,y)[1]
p_values.round(3)
#reg_summary=pd.DataFrame(data=['SAT','Rand 1,2,3'], columns=['features'])
reg_summary=pd.DataFrame(data=X.columns, columns=['features'])
reg_summary['cofficients']=reg.coef_
reg_summary['p-values']=p_values.round(3)
print(reg_summary)
'''
p-value for feature "Rand 1,2,3" is 0.676, much bigger than 0.05.
'''

阅读(1001) | 评论(0) | 转发(0) |

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们