先挖个坑,ID3 mitmatlab工具包里面有,直接调用就可以了,但是呢help信息太好,函数参数不知道含义,现成的函数都没有办法调用,网上找了一遍都没有,算了自己慢慢研究吧!
先看网上找到的,ID3的参数解释:
function D = ID3(train_features, train_targets, params, region)
% Classify using Quinlan's ID3 algorithm
% Inputs:
% features - Train features 训练要素
% targets - Train targets 训练目标
% params - [Number of bins for the data, Percentage of incorrectly assigned samples at a node] 一个节点中指定样本的错误率
% region - Decision region vector: [-x x -y y number_of_points]决策区域矢量
%
% Outputs
% D - Decision sufrace
%NOTE: In this implementation it is assumed that a pattern vector with fewer than 10 unique values (the parameter Nu)
%is discrete, and will be treated as such. Other vectors will be treated as continuous
对比决策树treefit函数的运行方法:
TREEFIT Fit a tree-based model for classification or regression.
T = TREEFIT(X,Y) creates a decision tree T for predicting response Y
as a function of predictors X. X is an N-by-M matrix of predictor
values. Y is either a vector of N response values (for regression),
or a character array or cell array of strings containing N class
names (for classification). Either way, T is binary tree where each
non-terminal node is split based on the values of a column of X. NaN
values in X or Y are taken to be missing values, and observations with
any missing values are not used in the fit.
Y作为X的函数,X是N*M的矩阵,应该就是样本的属性值,Y是样本的分类结果.从例子中meas和species推测出来的.
Example: Create classification tree for Fisher's iris data.
load fisheriris;
t = treefit(meas, species);
treedisp(t,'names',{'SL' 'SW' 'PL' 'PW'});
treedisp:tree display显示树的函数,4个属性名称为SL SW PL PW,花萼长,花萼宽,花瓣长,花瓣宽.
结果:
依葫芦画瓢
ID3的前两个参数可以类似确定,但是还有后面两个还没有确定。尤其是最后一个各说明中格式很奇怪。
ID3(meas, species,[Nbins, inc_node],[-x x -y y number_of_points])
mitmatlab数据挖掘工具箱中有一个用户手册,User guide.pdf.
根据这个手册,工具是有图像界面的。运行 classifier就能出来,
但可能会点界面上的键没有反应,提示
??? Error: File: classifier_commands.m Line: 95 Column: 9
A BREAK statement appeared outside of a loop. Use RETURN instead.
我是直接把break注释掉,加上return,不知道为什么在if里面也会用break奇怪,classifier_commands('Init');也有一样的问题,同样处理就可以了。
根据A comprehensive example,运行k-mean方法和手册上的结果是一样的。数据用mitmatlab中的clouds数据。
mitmatlab数据挖掘工具箱的图形版是为two-class two-dimensional problems设计的。有些算法可以支持多维,需要先进行特征选择。
ID3是不能用花的数据,因为行列的含义正好相反。
第一个参数是 样本的属性值,一个样本是一个列向量,文件是一个M*N的矩阵;M一般等于2,大于2的时候需要进行特征选择
第二个参数是 样本所属的分类类型,是1*N的矩阵;
第三个参数 默认是[5,1]
第四个参数 可以用calculate_region(features, region)计算得到。
ID3也有break的错误,把break,换成return。就可以正常运行了。
CART运行也没有问题
目前C4_5还是会有问题,再试试!提示错误为输入的第五个参数Nu没有定义。
Nu=10
继续报错,索引超出矩阵维数.
Index exceeds matrix dimensions.
Error in ==> C4_5>use_tree at 63
in = indices(find(patterns(dim, indices) <= tree.split_loc));
Error in ==> C4_5 at 40
test_targets = use_tree(test_patterns, 1:size(test_patterns,2), tree, discrete_dim, unique(train_targets));
Error in ==> start_classify at 62
D = feval(Classification_algorithm, train_features, train_targets, AlgorithmParameters, region);
Error in ==> classifier_commands at 204
[D, test_err, train_err] = start_classify(features, targets, error_method, redraws, percent, preprocessing, PreprocessingParameters, ...
??? Error while evaluating uicontrol Callback.
阅读(4054) | 评论(0) | 转发(0) |