Simple_linear_reg-hmchzb19-ChinaUnix博客

这一段时间比较闲，找了UDEMY上面的教程看，看了些DataScience. 记录一下，今天是Linear Regression. 按照Lazyprogrammer的教程上，我们应该把这些所谓的DataScience都当做 Geometry.

代码如下,需要提前安装statsmodel,seaborn,pandas,matplotlib 可以使用ipython3或者jupyter.
csv文件是我从github上找的，乱搜索了一通kaggle,google. 最后直接clone了别人的一个repo

点击(此处)折叠或打开

git clone https://github.com/timurista/data-analysis

 
			
				
				
					
					
						点击(此处)折叠或打开
					

				

				
					
					
						
						
							import statsmodels.api as sm  
						

						
							import seaborn as sns
						

						
							import matplotlib.pyplot as plt
						

						
							import pandas as pd
						

						
							sns.set() 
						

						
							
						

						
							data=pd.read_csv('data-analysis/python-jupyter/1.01. Simple linear regression.csv') 
						

						
							data.describe() 
						

						
							y=data['GPA'] 
						

						
							X=data['SAT'] 
						

						
							plt.scatter(X,y) 
						

						
							plt.xlabel('SAT',fontsize=20) 
						

						
							plt.ylabel('GPA', fontsize=20) 
						

						
							plt.show() 
						

						
							
						

						
							x1=sm.add_constant(X) 
						

						
							results=sm.OLS(y,x1).fit() 
						

						
							'''  
						

						
							intercept=0.275
						

						
							slope=0.0017
						

						
							
						

						
							coef=coefficient
						

						
							t: t-statistic
						

						
							P>|t|: p-value, less than 0.005 menas the variable is significant, so the best value we want is 0.000
						

						
							
						

						
							''' 
						

						
							print(results.summary()) 
						

						
							plt.scatter(X,y) 
						

						
							
						

						
							yhat=0.0017*X+0.275
						

						
							fig=plt.plot(X, yhat, lw=4, c='orange', label='regression line') 
						

						
							plt.xlabel('SAT', fontsize=20) 
						

						
							plt.ylabel('GPA', fontsize=20) 
						

						
							plt.show()