使用python以及工具包进行简单的验证码识别-laoliulaoliu-ChinaUnix博客

miraclemiracle.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

laoliulaoliu

博客访问： 4663334
博文数量： 1214
博客积分： 13195
博客等级：上将
技术积分： 9105
用户组：普通用户
注册时间： 2007-01-19 14:41

个人简介

C++,python,热爱算法和机器学习

文章分类

全部博文（1214）

cloud（3）
operation（9）
tornado（4）
mac_os（1）
golang（4）
架构（13）
git（4）
security（29）
shell（1）
macbook（1）
ruby（13）
javascript（15）
design（3）
testing（1）
mac（1）
bigdata（69）
nosql（46）
R（9）
gcj/acm（6）
NLP（10）
小说（3）
matlab（4）
web（44）
java（66）
product（7）
c#（1）
language（4）
machine learning（76）
science（4）
opencourse（2）
windows（3）
search（33）
algorithm（65）
database（51）
compiler（11）
ACE（5）
poem（1）
programming（29）
python（140）
assembly（1）
linux（49）
C++（16）
book（2）
cate（1）
phliosophy（3）
mental（30）
Science fiction（1）
Software（5）
c（23）
network（65）
CS（15）
thinking（10）
BSD（13）
solaris10（2）
life（57）
Debian（16）
economy（7）
Mathematics（57）
OS（8）
ibm（2）
gentoo（32）
未分配的博文（8）

文章存档

2021年（13）

2020年（49）

2019年（14）

2018年（27）

2017年（69）

2016年（100）

2015年（106）

2014年（240）

2013年（5）

2012年（193）

2011年（155）

2010年（93）

2009年（62）

2008年（51）

2007年（37）

我的朋友

相关博文

使用python以及工具包进行简单的验证码识别

分类： Python/Ruby

2014-08-04 23:15:52

文章来源：http://blog.csdn.net/nwpulei/article/details/8457738

闲话休提，直接开始。

原始图像

Step 1 打开图像吧。

[python]view plaincopy
			
			im = Image.open('temp1.jpg')

Step 2 把彩色图像转化为灰度图像。彩色图像转化为灰度图像的方法很多，这里采用RBG转化到HSI彩色空间，采用I分量。

[python]view plaincopy
			
			imgry = im.convert('L')

灰度看起来是这样的

Step 3 需要把图像中的噪声去除掉。这里的图像比较简单，直接阈值化就行了。我们把大于阈值threshold的像素置为1，其他的置为0。对此，先生成一张查找表，映射过程让库函数帮我们做。

[python]view plaincopy
			
			threshold = 140   
		
			table = []   
		
			for i in range(256):   
		
			    if i < threshold:   
		
			        table.append(0)   
		
			    else:   
		
			        table.append(1)

阈值为什么是140呢？试出来的，或者参考直方图。

映射过程为

[python]view plaincopy
			
			out = imgry.point(table,'1')

此时图像看起来是这样的

Step 4 把图片中的字符转化为文本。采用pytesser 中的image_to_string函数

[python]view plaincopy
			
			text = image_to_string(out)

Step 5 优化。根据观察，验证码中只有数字，并且上面的文字识别程序经常把8识别为S。因此，对于识别结果，在进行一些替换操作。

[python]view plaincopy
				
				#由于都是数字   
			
				#对于识别成字母的 采用该表进行修正   
			
				rep={'O':'0',   
			
				    'I':'1','L':'1',   
			
				    'Z':'2',   
			
				    'S':'8'   
			
				    };

[python]view plaincopy
				
				for r in rep:   
			
				    text = text.replace(r,rep[r])

好了，text中为最终结果。

7025
0195
7039
6716

程序需要和支持。

最后，整个程序看起来是这样的

[python]view plaincopy
				
				import Image   
			
				import ImageEnhance   
			
				import ImageFilter   
			
				import sys   
			
				from pytesser import *   
			
				# 二值化   
			
				threshold = 140   
			
				table = []   
			
				for i in range(256):   
			
				    if i < threshold:   
			
				        table.append(0)   
			
				    else:   
			
				        table.append(1)   
			
				#由于都是数字   
			
				#对于识别成字母的 采用该表进行修正   
			
				rep={'O':'0',   
			
				    'I':'1','L':'1',   
			
				    'Z':'2',   
			
				    'S':'8'   
			
				    };   
			
				def  getverify1(name):   
			
				    #打开图片   
			
				    im = Image.open(name)   
			
				    #转化到亮度   
			
				    imgry = im.convert('L')   
			
				    imgry.save('g'+name)   
			
				    #二值化   
			
				    out = imgry.point(table,'1')   
			
				    out.save('b'+name)   
			
				    #识别   
			
				    text = image_to_string(out)   
			
				    #识别对吗   
			
				    text = text.strip()   
			
				    text = text.upper();   
			
				    for r in rep:   
			
				        text = text.replace(r,rep[r])   
			
				    #out.save(text+'.jpg')   
			
				    print text   
			
				    return text   
			
				getverify1('v1.jpg')   
			
				getverify1('v2.jpg')   
			
				getverify1('v3.jpg')   
			
				getverify1('v4.jpg')

阅读(1108) | 评论(0) | 转发(0) |

上一篇：15个最受欢迎的Python开源框架

下一篇：git如何恢复旧版本

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6