Chinaunix首页 | 论坛 | 博客
  • 博客访问: 494590
  • 博文数量: 144
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 1190
  • 用 户 组: 普通用户
  • 注册时间: 2013-10-08 20:16
文章分类

全部博文(144)

文章存档

2017年(1)

2015年(5)

2014年(108)

2013年(30)

我的朋友

分类: 高性能计算

2014-04-18 17:23:46


?  介绍一下Adaboost的历史。

Adaboost的前身的Boosting算法

Boosting是一种提高任意给定学习算法准确度的方法。它的思想起源于Valiant提出的PAC(Probably Approximately Correct)学习模型。ValiantKearns提出了弱学习和强学习的概念,识别错误率小于1/2,也即准确率仅比随机猜测略高的学习算法称为弱学习算法;识别准确率很高并能在多项式时间内完成的学习算法称为强学习算法。同时,ValiantKearns首次提出了PAC学习模型中弱学习算法和强学习算法的等价性问题,即任意给定仅比随机猜测略好的弱学习算法,是否可以将其提升为强学习算法?如果二者等价,那么只需找到一个比随机猜测略好的弱学习算法就可以将其提升为强学习算法,而不必寻找很难获得的强学习算法。

1990, Schapire最先构造出一种多项式级的算法,对该问题做了肯定的证明,这就是最初的Boosting算法。一年后,Freund提出了一种效率更高的Boosting算法。但是,这两种算法存在共同的实践上的缺陷,那就是都要求事先知道弱学习算法学习正确率的下限

1995, Freundschapire改进了Boosting算法,提出了AdaBoost(Adaptive Boosting)算法[5],该算法效率和Freund1991年提出的Boosting算法几乎相同,但不需要任何关于弱学习器的先验知识,因而更容易应用到实际问题当中。之后,Freundschapire进一步提出了改变Boosting投票权重的AdaBoost.M1,AdaBoost.M2等算法,在机器学习领域受到了极大的关注。 

?  Adaboost详解

Adaboost是一种迭代算法,其核心思想是针对同一个训练集训练不同的分类器(弱分类器),然后把这些弱分类器集合起来,构成一个更强的最终分类器(强分类器)

其算法本身是通过改变数据分布来实现的,它根据每次训练集之中每个样本的分类是否正确,以及上次的总体分类的准确率,来确定每个样本的权值。将修改过权值的新数据集送给下层分类器进行训练,最后将每次训练得到的分类器最后融合起来,作为最后的决策分类器。使用adaboost分类器可以排除一些不必要的训练数据特征,并将关键放在关键的训练数据上面。    

?  在线训练阶段流程图:


最终出来的结构应该是这样的:

N级分类器,每个分类器带一个自己的阈值;N个分类器的权重比例;整个强分类器的阈值

?  离线检测阶段流程图:

如图 


?  工具包

算法的流程理解了,可是要将这个算法完全自己实现,难度还是很大的。幸好已经有人做了这方面的工作。

1.      如果要训练的是Haar特征,opencvcvHaarTraining就足够了,什么adaboostcascade,神马都不用管,按格式写好文件,等着结果就行了。

2.      OPENCVcvBoost,主要有这么几个函数

bool CvBoost::train(。。。);

CvBoost::load(。。。)

float CvBoost::predict(。。。)

opencvsample中有一个多类分类的问题,可以参考一下,不过读文件那块写得真烂。。。

      (以上转自:http://shijuanfeng.blogbus.com/logs/100675208.html)


下载GML AdaBoost Matlab Toolbox源码:


里面写好了MATLAB代码,还有例子大家都可以先参考下。其他废话就不说了,看下面的英文介绍哈。
GML AdaBoost Matlab Toolbox is set of matlab functions and classes implementing a family of classification algorithms, known as Boosting.
Implemented algorithms
So far we have implemented 3 different boosting schemes: Real AdaBoost, Gentle AdaBoost and Modest AdaBoost.

Real AdaBoost (see [2] for full description) is the generalization of a basic AdaBoost algorithm first introduced by Fruend and Schapire [1]. Real AdaBoost should be treated as a basic “hardcore” boosting algorithm.
Gentle AdaBoost is a more robust and stable version of real AdaBoost (see [3] for full description). So far, it has been the most practically efficient boosting algorithm, used, for example, in Viola-Jones object detector [4]. Our experiments show, that Gentle AdaBoost performs slightly better then Real AdaBoost on regular data, but is considerably better on noisy data, and much more resistant to outliers.
Modest AdaBoost (see [5] for a full description) – regularized tradeoff of AdaBoost, mostly aimed for better generalization capability and resistance to overfitting. Our experiments show, that in terms of test error and overfitting this algorithm outperforms both Real and Gentle AdaBoost.

Available weak learners
We have implemented a classification tree as a weak learner.

Additional functionalities
Alongside with 3 Boosting algorithms we also provide a class that should give you an easy way to make a crossvalidation test.

Using trained classifiers in C++ applications
In 0.3 version of toolbox you can save constructed classifier to file and load it in your C++ application. C++ code for loading and using saved classifier is provided.

Authors
This toolbox was developed and implemented by Alexander Vezhnevets – an undergraduate student of Moscow State University. If you have any questions or suggestions, please mail me: avezhnevets@graphics.cs.msu.ru


Reference
[1] Y Freund and R. E. Schapire. Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325–332, 1996.


[2] R.E. Schapire and Y. Singer Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297-336, December 1999.


[3] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 38(2):337–374, April 2000.


[4] P. Viola and M. Jones. Robust Real-time Object Detection. In Proc. 2nd Int'l Workshop on Statistical and Computational Theories of Vision -- Modeling, Learning, Computing and Sampling, Vancouver, Canada, July 2001.


[5] Alexander Vezhnevets, Vladimir Vezhnevets 'Modest AdaBoost' - Teaching AdaBoost to Generalize Better. Graphicon-2005, Novosibirsk Akademgorodok, Russia, 2005.
.pdf (107kb)
[6] Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science. 
阅读(7459) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~