在sklearn里面有个数据集叫load_breast_cancer,569个sample,每个sample有30个feature.
我打算用svm.SVC来试试看.主要尝试的参数有kernel和gamma,kernel分别设置为rbf和sigmoid. gamma则设置为auto和scale.
经过实验我的感觉如下 sigmoid的score略低。而rbf稍微好一点.
‘gamma’设置为auto和scale最后的score都一样.
我很奇怪,sigmoid的train score比test score还少了0.03, 3%为什么呢?
-
(569, 30)
-
(569,)
-
['malignant' 'benign']
-
X data shape:(569, 30); no. positive:357; no. negative:212
-
svm with kernel "rbf" train score and test score
-
train score: 0.984251968503937, test score: 0.9680851063829787
-
-
svm with kernel "sigmoid" train score and test score
-
train score: 0.931758530183727, test score: 0.9627659574468085
代码如下:
-
from sklearn.datasets import load_breast_cancer
-
from sklearn.model_selection import train_test_split
-
from sklearn import svm
-
from sklearn.preprocessing import StandardScaler
-
from sklearn.metrics import classification_report, accuracy_score
-
-
cancer = load_breast_cancer()
-
X, y = cancer.data, cancer.target
-
print(X.shape)
-
print(y.shape)
-
print(cancer.target_names)
-
print('X data shape:{0}; no. positive:{1}; no. negative:{2}'.format(X.shape, y[y==1].shape[0], y[y==0].shape[0]))
-
-
X_train,X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
-
-
scaler=StandardScaler()
-
X_train = scaler.fit_transform(X_train)
-
X_test = scaler.transform(X_test)
-
-
svm_cf = svm.SVC(C=1, kernel='rbf',gamma='scale', tol=.001, random_state=42)
-
svm_cf.fit(X_train, y_train)
-
y_train_pred = svm_cf.predict(X_train)
-
y_test_pred = svm_cf.predict(X_test)
-
train_score = accuracy_score(y_train, y_train_pred)
-
test_score = accuracy_score(y_test, y_test_pred)
-
-
print('svm with kernel "rbf" train score and test score')
-
print('train score: {}, test score: {}'.format(train_score, test_score))
-
-
svm_sig_cf = svm.SVC(C=1, kernel='sigmoid', tol=.001, gamma='auto',random_state=42)
-
svm_sig_cf.fit(X_train, y_train)
-
y_train_pred = svm_sig_cf.predict(X_train)
-
y_test_pred = svm_sig_cf.predict(X_test)
-
train_score = accuracy_score(y_train, y_train_pred)
-
test_score = accuracy_score(y_test, y_test_pred)
-
-
print('\nsvm with kernel "sigmoid" train score and test score')
-
print('train score: {}, test score: {}'.format(train_score, test_score))
-
阅读(1195) | 评论(0) | 转发(0) |