Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1794523
  • 博文数量: 297
  • 博客积分: 285
  • 博客等级: 二等列兵
  • 技术积分: 3006
  • 用 户 组: 普通用户
  • 注册时间: 2010-03-06 22:04
个人简介

Linuxer, ex IBMer. GNU https://hmchzb19.github.io/

文章分类

全部博文(297)

文章存档

2020年(11)

2019年(15)

2018年(43)

2017年(79)

2016年(79)

2015年(58)

2014年(1)

2013年(8)

2012年(3)

分类: 云计算

2020-04-03 11:12:35

使用sklearn练习的multiple_linear_regression, sklearn没有现成计算p-value,adjusted-R-squared的方法。也没有statsmodel那样的summary,需要自己手动制作.

点击(此处)折叠或打开

  1. α: level of significance, 常取值 0.05, 0.01,
  2. (1-α): confidence level
  3. if we have a α = 0.05, means we are 95% confidence the feature is significant
  4. the aim is -- the p-values always less than α.

点击(此处)折叠或打开

  1. # coding: utf-8
  2. import statsmodels.api as sm
  3. import statsmodels.formula.api as smf
  4. import seaborn as sns
  5. import matplotlib.pyplot as plt
  6. import pandas as pd
  7. import numpy as np
  8. sns.set()

  9. data=pd.read_csv('data-analysis/python-jupyter/1.02. Multiple linear regression.csv')
  10. #print(data.describe())
  11. X=data[['SAT','Rand 1,2,3']]
  12. y=data['GPA']

  13. from sklearn.linear_model import LinearRegression
  14. reg=LinearRegression()

  15. reg.fit(X, y)

  16. #compute r-sqaured ,adjusted-R-squared
  17. r2=reg.score(X,y)
  18. n,p=X.shape[0], X.shape[1]
  19. adjusted_r2=1-(1-r2)*(n-1)/(n-p-1)

  20. from sklearn.feature_selection import f_regression
  21. '''
  22. f_regression() =>return 2 array
  23. 1st array: F : shape=(n_features,), => F values of features, F-statistic
  24. 2nd array: p-value : shape=(n_features,), => p-values of F-scores.
  25. We always want the p-value to be less than 0.05

  26. '''

  27. #get p_values for these 2 features
  28. p_values=f_regression(X,y)[1]
  29. p_values.round(3)

  30. #reg_summary=pd.DataFrame(data=['SAT','Rand 1,2,3'], columns=['features'])
  31. reg_summary=pd.DataFrame(data=X.columns, columns=['features'])
  32. reg_summary['cofficients']=reg.coef_
  33. reg_summary['p-values']=p_values.round(3)
  34. print(reg_summary)

  35. '''
  36. p-value for feature "Rand 1,2,3" is 0.676, much bigger than 0.05.
  37. '''


阅读(986) | 评论(0) | 转发(0) |
0

上一篇:Linear_reg_sklearn

下一篇:Standardization_example

给主人留下些什么吧!~~