Chinaunix首页 | 论坛 | 博客
  • 博客访问: 442059
  • 博文数量: 85
  • 博客积分: 3580
  • 博客等级: 中校
  • 技术积分: 970
  • 用 户 组: 普通用户
  • 注册时间: 2010-03-09 14:09
文章分类

全部博文(85)

文章存档

2011年(7)

2010年(78)

我的朋友
svm

分类:

2010-05-06 13:32:05

Vapnik等人在多年研究统计学习理论基础上对线性分类器提出了另一种设计最佳准则。其原理也从线性可分说起,然后扩展到线性不可分的情况。甚至扩展到使用非线性函数中去,这种分类器被称为支持向量机(Support Vector Machine,简称SVM)。支持向量机的提出有很深的理论背景。 支持向量机方法是在近年来提出的一种新方法。
  SVM的主要思想可以概括为两点: (1) 它是针对线性可分情况进行分析,对于线性不可分的情况,通过使用非线性映射算法将低维输入空间线性不可分的样本转化为高维特征空间使其线性可分,从而 使得高维特征空间采用线性算法对样本的非线性特征进行线性分析成为可能;(2) 它基于结构风险最小化理论之上在特征空间中建构最优分割超平面,使得学习器得到全局最优化,并且在整个样本空间的期望风险以某个概率满足一定上界。
  在学习这种方法时,首先要弄清楚这种方法考虑问题的特点,这就要从线性可分的最简单情况讨论起,在没有弄懂其原理之前,不要急于学习线性不可分等较复杂的情况,支持向量机在设计时,需要用到条件极值问题的求解,因此需用拉格朗日乘子理论,但对多数人来说,以前学到的或常用的是约束条件为等式表示的方式,但在此要用到以不等式作为必须满足的条件,此时只要了解拉格朗日理论的有关结论就行。 
  svm的一般特征:(1)SVM学习问题可以表示为凸优化问题,因此可以利用已知的有效算法发现目标函数的全局最小值。而其他分类方法(如基于规则的分类器和人工神经网络)都采用一种基于贪心学习的策略来搜索假设空间,这种方法一般只能获得局部最优解。
  (2)SVM通过最大化决策边界的边缘来控制模型的能力。尽管如此,用户必须提供其他参数,如使用核函数类型和引入松弛变量等。
  (3)通过对数据中每个分类属性引入一个哑变量,SVM可以应用与分类数据。
  (4)SVM不仅可以用在二类问题,还可以很好的处理多类问题。
  主要软件包:
  Lush -- an Lisp-like interpreted/compiled language with C/C++/Fortran interfaces that has packages to interface to a number of different SVM implementations. Interfaces to LASVM, LIBSVM, mySVM, SVQP, SVQP2 (SVQP3 in future) are available. Leverage these against Lush's other interfaces to machine learning, hidden markov models, numerical libraries (LAPACK, BLAS, GSL), and builtin vector/matrix/tensor engine. 
  SVMlight -- a popular implementation of the SVM algorithm by Thorsten Joachims; it can be used to solve classification, regression and ranking problems. 
  LIBSVM -- A Library for Support Vector Machines, Chih-Chung Chang and Chih-Jen Lin 
  YALE -- a powerful machine learning toolbox containing wrappers for SVMLight, LibSVM, and MySVM in addition to many evaluation and preprocessing methods. 
  LS-SVMLab - Matlab/C SVM toolbox - well-documented, many features 
  Gist -- implementation of the SVM algorithm with feature selection. 
  Weka -- a machine learning toolkit that includes an implementation of an SVM classifier; Weka can be used both interactively though a graphical interface or as a software library. (One of them is called "SMO". In the GUI Weka explorer, it is under the "classify" tab if you "Choose" an algorithm.) 
  OSU SVM - Matlab implementation based on LIBSVM 
  Torch - C++ machine learning library with SVM 
  Shogun - Large Scale Machine Learning Toolbox with interfaces to Octave, Matlab, Python, R 
  Spider - Machine learning library for Matlab 
  kernlab - Kernel-based Machine Learning library for R 
  e1071 - Machine learning library for R 
  SimpleSVM - SimpleSVM toolbox for Matlab 
  SVM and Kernel Methods Matlab Toolbox 
  PCP -- C program for supervised pattern classification. Includes LIBSVM wrapper. 

  TinySVM -- a small SVM implementation, written in C++


==========================================================

常见软件

SVMlight 

SVMlight, by Joachims, is one of the most widely used SVM classification and regression package. It has a fast optimization algorithm, can be applied to very large datasets, and has a very efficient implementation of the leave-one-out cross-validation. Distributed as C++ source and binaries for Linux, Windows, Cygwin, and Solaris. Kernels: polynomial, radial basis function, and neural (tanh).


SVMstruct
svm_struct.html
SVMstruct, by Joachims, is an SVM implementation that can model complex (multivariate) output data y, such as trees, sequences, or sets. These complex output SVM models can be applied to natural language parsing, sequence alignment in protein homology detection, and Markov models for part-of-speech tagging. Several implementations exist: SVMmulticlass, for multi-class classification; SVMcfg, learns a weighted context free grammar from examples; SVMalign, learns to align protein sequences from training alignments; SVMhmm, learns a Markov model from examples. These modules have straightforward applications in bioinformatics, but one can imagine significant implementations for cheminformatics, when the chemical structure is represented as trees or sequences.


mySVM

mySVM, by Stefan Rüping, is a C++ implementation of SVM classification and regression. Available as C++ source code and Windows binaries. Kernels: linear, polynomial, radial basis function, neural (tanh), anova.


JmySVM

JmySVM, a Java version of mySVM is part of the YaLE (Yet Another Learning Environment) learning environment.


mySVM/db

mySVM/db is an efficient extension of mySVM which is designed to run directly inside a relational database using an internal JAVA engine. It was tested with an Oracle database, but with small modifications it should also run on any database offering a JDBC interface. It is especially useful for large datasets available as relational databases.


LIBSVM
~cjlin/libsvm/
LIBSVM (Library for Support Vector Machines), is developed by Chang and Lin and contains C-classification, ν-classification, ε-regression, and ν-regression. Developed in C++ and Java, it supports also multi-class classification, weighted SVM for unbalanced data, cross-validation and automatic model selection. It has interfaces for Python, R, Splus, MATLAB, Perl, Ruby, and LabVIEW. Kernels: linear, polynomial, radial basis function, and neural (tanh).


looms
~cjlin/looms/
looms, by Lee and Lin, is a very efficient leave-one-out model selection for SVM two-class classification. While LOO cross-validation is usually too time consuming to be performed for large datasets, looms implements numerical procedures that make LOO accessible. Given a range of parameters, looms automatically returns the parameter and model with the best LOO statistics. Available as C source code and Windows binaries.


BSVM
~cjlin/bsvm/
BSVM, authored by of Hsu and Lin, provides two implementations of multi-class classification, together with SVM regression. Available as source code for UNIX/Linux and as binaries for Windows.


SVMTorch

SVMTorch, by Collobert and Bengio, is part of the Torch machine learning library and implements SVM classification and regression. Distributed as C++ source code or binaries for Linux and Solaris.


Weka

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from a Java code. Contains an SVM implementation.


SVM in R

This SVM implementation in R () contains C-classification, n-classification, e-regression, and n-regression. Kernels: linear, polynomial, radial basis, neural (tanh).


M-SVM
~guermeur/
Multi-class SVM implementation in C by Guermeur.


Gist

Gist is a C implementation of support vector machine classification and kernel principal components analysis. The SVM part of Gist is available as an interactive web server at and it is a very convenient option for users that want to experiment with small datasets (several hundreds patterns). Kernels: linear, polynomial, radial.


MATLAB SVM Toolbox

This SVM MATLAB toolbox, by Gunn, implements SVM classification and regression with various kernels: linear, polynomial, Gaussian radial basis function, exponential radial basis function, neural (tanh), Fourier series, spline, and B spline.


TinySVM
~taku/software/TinySVM/
TinySVM is a C++ implementation of C-classification and C-regression which uses sparse vector representation and can handle several ten-thousands of training examples, and hundred-thousands of feature dimensions. Distributed as binary/source for Linux and binary for Windows.


SmartLab

SmartLab provides several support vector machines implementations: cSVM, Windows and Linux implementation of two-classes classification; mcSVM, Windows and Linux implementation of multi-classes classification; rSVM, Windows and Linux implementation of regression; javaSVM1 and javaSVM2, Java applets for SVM classification.


Gini-SVM

Gini-SVM, by Chakrabartty and Cauwenberghs, is a multi-class probability regression engine that generates conditional probability distribution as a solution. Available as source code.


GPDT

GPDT, by Serafini, Zanni, and Zanghirati, is a C++ implementation for large-scale SVM classification in both scalar and distributed memory parallel environments. Available as C++ source code and Windows binaries.


HeroSvm
~people/jdong/HeroSvm.html
HeroSvm, by Dong, is developed in C++, implements SVM classification, and is distributed as a dynamic link library for Windows. Kernels: linear, polynomial, radial basis function.


Spider

Spider is an object orientated environment for machine learning in MATLAB, for unsupervised, supervised or semi-supervised machine learning problems, and includes training, testing, model selection, cross-validation, and statistical tests. Implements SVM multi-class classification and regression.


Java applets

These SVM classification and SVM regression Java applets were developed by members of Royal Holloway, University of London and AT&T Speech and Image Processing Services Research Lab.


LEARNSC

MATLAB scripts for the book Learning and Soft Computing by Kecman, implementing SVM classification and regression.


Tree Kernels

Tree Kernels, by Moschitti, is an extension of SVMlight, obtained by encoding tree kernels. Available as binaries for Windows, Linux, Mac-OSx, and Solaris. Tree kernels are suitable for encoding chemical structures, and thus this package brings significant capabilities for cheminformatics applications.


LS-SVMlab

LS-SVMlab, by Suykens, is a MATLAB implementation of least squares support vector machines (LS-SVM) which reformulates the standard SVM leading to solving linear KKT systems. LS-SVM alike primal-dual formulations have been given to kernel PCA, kernel CCA and kernel PLS, thereby extending the class of primal-dual kernel machines. Links between kernel versions of classical pattern recognition algorithms such as kernel Fisher discriminant analysis and extensions to unsupervised learning, recurrent networks and control are available.


MATLAB SVM Toolbox

This is a MATLAB SVM classification implementation which can handle 1-norm and 2-norm SVM (linear or quadratic loss functions).


SVM/LOO

SVM/LOO, by Cauwenberghs, has a very efficient MATLAB implementation of the leave-one-out cross-validation.


SVMsequel
~hdaume/SVMsequel/
SVMsequel, by Daume III, is a SVM multi-class classification package, distributed as C source or binaries for Linux or Solaris. Kernels: linear, polynomial, radial basis function, sigmoid, string, tree, information diffusion on discrete manifolds.


LSVM

LSVM (Lagrangian Support Vector Machine) is a very fast SVM implementation in MATLAB by Mangasarian and Musicant. It can classify datasets with several millions patterns.


ASVM

ASVM (Active Support Vector Machine) is a very fast linear SVM script for MATLAB, by Musicant and Mangasarian, developed for large datasets.


PSVM

PSVM (Proximal Support Vector Machine) is a MATLAB script by Fung and Mangasarian which classifies patterns by assigning them to the closest of two parallel planes.


OSU SVM Classifier Matlab Toolbox
~maj/osu_svm/
This MATLAB toolbox is based on LIBSVM.


SimpleSVM Toolbox
~gloosli/simpleSVM.html
SimpleSVM Toolbox is a MATLAB implementation of the SimpleSVM algorithm.


SVM Toolbox
%7Earakotom/toolbox/index
A fairly complex MATLAB toolbox, containing many algorithms: classification using linear and quadratic penalization, multi-class classification, ε-regression, ν-regression, wavelet kernel, SVM feature selection.


MATLAB SVM Toolbox
~gcc/svm/toolbox/
Developed by Cawley, has standard SVM features, together with multi-class classification and leave-one-out cross-validation.


R-SVM
~xzhang/R-SVM/R-SVM.html
R-SVM, by Zhang and Wong, is based on SVMTorch and is specially designed for the classification of microarray gene expression_r data. R-SVM uses SVM for classification and for selecting a subset of relevant genes according to their relative contribution in the classification. This process is done recursively in such a way that a series of gene subsets and classification models can be obtained in a recursive manner, at different levels of gene selection. The performance of the classification can be uated either on an independent test data set or by cross-validation on the same data set. Distributed as Linux binary.


jSVM
~hwawen/research/projects/jsvm/doc/manual/index.html
jSVM is a Java wrapper for SVMlight.


SvmFu

SvmFu, by Rifkin, is a C++ package for SVM classification. Kernels: linear, polynomial, and Gaussian radial basis function.


PyML

PyML is an interactive object oriented framework for machine learning in Python. It contains a wrapper for LIBSVM, and procedures for optimizing a classifier: multi-class methods, descriptor selection, model selection, jury of classifiers, cross-validation, ROC curves.

BioJava

BioJava is an open-source project dedicated to providing a Java framework for processing biological data. It include objects for manipulating sequences, file parsers, DAS client and server suport, access to BioSQL and Ensembl databases, and powerful analysis and statistical routines including a dynamic programming toolkit. The package org.biojava.stats.svm contains SVM classification and regression.

阅读(1586) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~