GDA和Logistic方法的区别及相应的python代码-niao5929-ChinaUnix博客

birdofpreybirdofprey.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

niao5929

博客访问： 7364731
博文数量： 3857
博客积分： 6409
博客等级：准将
技术积分： 15948
用户组：普通用户
注册时间： 2008-09-02 16:48

个人简介

迷彩潜伏隐蔽伪装

文章分类

全部博文（3857）

大数据计算（149）
随想（82）
编程语言（372）

python（3）

lisp（0）

JAVA C++（2）

GOLANG（0）
数据库（115）
高可用集群（412）

分布式系统（26）

SDN（0）

细胞节点（78）

分布式网络（5）
Linux（1172）

SHELL（10）

网络（209）
未分配的博文（1555）

文章存档

2017年（5）

2016年（63）

2015年（927）

2014年（677）

2013年（807）

2012年（1241）

2011年（67）

2010年（7）

2009年（36）

2008年（28）

我的朋友

相关博文

GDA和Logistic方法的区别及相应的python代码

分类：大数据

2014-04-17 13:23:03

原文地址：GDA和Logistic方法的区别及相应的python代码作者：bl竹子

GDA方法与Logistic方法的主要区别在于这两个模型的假设不同：GDA方法假设p(x|y)服从多元高斯分布，并且输入特征是连续的；Logistic方法并没有GDA那么强的假设，它既没有要求p(x|y)服从多元高斯分布，也没有要求输入特征是连续的。因此Logistic的适用范围比GDA更加广泛。例如：如果输入特征符合泊松分布，则Logistic得到的结果会比GDA更加准确。如果输入特征满足GDA的要求时，既可以用Logistic方法也可以用GDA，但是在这种情况下GDA得到的结果会比Logistic方法得到的结果准确些。下面给出GDA和Logistic方法的简要说明，最后给出相应的 python代码。
GDA是一种生成学习法，主要利用贝叶斯准则得到后验分布律，然后通过最大后验分布对输入数据进行分类。简单地说，也就是在给定某个特征情况下，拥有此特征的数据属于哪个类的概率大就属于哪个类。GDA的优势：由于有高斯分布的先验信息，如果确实符合实际数据，则只需要少量的样本就可以得到较好的模型。
Logistic是一种判别想学习法，判别学习法通过建立输入数据与输出信息之间的映射关系学得p(y|x)，这个与生成学习法是不同的。在生成学习法中首先要确定p(x|y)和p(y)。Logistic主要是通过sigmoid函数来确定输入数据及是将如何进行分类的。Logistic的优势：具有更高的鲁棒性和对数据的分布不明感(不想GDA那样需要特征服从高斯分布)。
下面是具体的python代码：
一、GDA模型的python代码：

点击(此处)折叠或打开

def GDA(dataIn, classLabel):
m = len(classLabel);
sum_1 = sum(classLabel);
q = sum_1/(float(m));
notLabel = ones((len(classLabel),),dtype=int)-array(classLabel);
row,col = shape(dataIn);
y0x = y1x = mat(zeros(col));
for i in range(m):
y0x += mat(dataIn[i])*notLabel[i];
y1x += mat(dataIn[i])*classLabel[i];
mean_0 = y0x/(m-sum_1);
mean_1 = y1x/sum_1;
correlation = 0;
for i in range(m):
correlation += (mat(dataIn[i]-mean_0)).T*(mat(dataIn[i]-mean_0))*notLabel[i] \
+(mat(dataIn[i]-mean_1)).T*(mat(dataIn[i]-mean_1))*classLabel[i];
correlation = correlation/m;
return q,mean_0,mean_1,correlation;
def calculate_pxy0(x,n=2):
return ((2*math.pi)**(-n/2))*(linalg.det(correlation)**(-0.5))*exp(-0.5*(x-mean_0).T*correlation.I*(x-mean_0));
def calculate_pxy1(n=2):
return ((2*math.pi)**(-n/2))*(linalg.det(correlation)**(-0.5))*exp(-0.5*(x-mean_1).T*correlation.I*(x-mean_1));
def GDAClass(testPoint,dataIn,classLabel):
import math;
x = testPoint;
q,mean_0,mean_1,correlation = GDA(dataIn,classLabel);
n=shape(dataIn)[0];
py0 = 1-q;
py1 = q;
pxy0 = calculate_pxy0(x,n);
pxy1 = calculate_pxy1(x,n);
if pxy0*py0 > pxy1*py1:
return 0;
return 1;

二、Logistic模型的python代码：

点击(此处)折叠或打开

def sigmoid(w,x):
return 1/(1+exp(-w*x))
def logisticRegression(xMat,yMat,maxCycles = 500):
'''
ones((m,n)): 产生m维的向量，且每个值为n
'''
col = shape(xMat)[1];
weight = ones((col,1));
alpha = 0.001;
for j in range(maxCycles):
h = sigmoid(weight,xMat);
err = (yMat-h);
weight += alpha*xMat.transpose*err;
return weight;

本文出处：http://blog.chinaunix.net/uid-28311809-id-4211362.html

阅读(695) | 评论(0) | 转发(0) |

上一篇：关于x86_64架构下atomic、mutex、rwlock的性能对比

下一篇：Zabbix系统内部数据采集

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6