polynomial regression
Not all relationships are linear.
Linear formula: y=mx+b
This is a "first order" or "first degree" polynomial , as the power of x is 1.
Second order polynomial: y=ax**2 + bx + c
Third order polynomial: y = ax**3 + bx**2 + cx + d
Higher orders polynomial produce more complex curves.
#beware overfitting
Don't use more degrees than you need.
Visualize your data first to see how complex of a curve there might really be.
Visualize the fit - is your curve going out of its way to accomodate outliers?
A high r-squared simply means your curve fits your training data well, but it may not be a good predictor.
code:
-
#fabricate data
-
np.random.seed(2)
-
pageSpeeds = np.random.normal(3.0, 1.0, 1000)
-
purchaseAmount = np.random.normal(50.0, 10.0, 1000) / pageSpeeds
-
plt.scatter(pageSpeeds, purchaseAmount)
-
plt.show()
-
#numpy has a handy polyfit function we can use, to let us construct an nth-degree polynomial model of our data that minimizes squared error. Let's try it with a 4th degree polynomial.
-
x=np.array(pageSpeeds)
-
y=np.array(purchaseAmount)
-
p4=np.poly1d(np.polyfit(x,y, 4))
-
-
#visualize
-
xp=np.linspace(0, 7, 100)
-
plt.scatter(x, y)
-
plt.plot(xp, p4(xp), c='r')
-
plt.show()
-
-
#measure the r-squared error, 0 is bad, and 1 is good.
-
from sklearn.metrics import r2_score
-
r2=r2_score(y, p4(x))
-
print(r2)
-
#output will be ,pretty good
-
0.82937663963
-
-
#change the order to 8
-
In [14]: p4=np.poly1d(np.polyfit(x,y, 8))
-
...:
-
In [15]: xp=np.linspace(0, 7, 100)
-
...: plt.scatter(x, y)
-
...: plt.plot(xp, p4(xp), c='r')
-
...: plt.show()
-
...:
-
-
In [16]: from sklearn.metrics import r2_score
-
...: r2=r2_score(y, p4(x))
-
...: print(r2)
-
...:
-
#more accurate than order of 4
-
0.881439566368
-
-
#change the order to 1 , this will be linear regression.
-
p4=np.poly1d(np.polyfit(x,y, 1))
-
xp=np.linspace(0, 7, 100)
-
plt.scatter(x, y)
-
plt.plot(xp, p4(xp), c='r')
-
plt.show()
-
from sklearn.metrics import r2_score
-
r2=r2_score(y, p4(x))
-
print(r2)
-
#r-squared is only 0.50
-
0.502494130455
阅读(1126) | 评论(0) | 转发(0) |