使用Python以及工具包进行简单的验证码识别

2014 年 7 月 19 日

div id=”content” contentScore=”981″>使用Python以及工具包进行简单的验证码识别,直接开始。

原始图像

Step 1 打开图像吧。

im = Image.open(‘temp1.jpg’)

Step 2 把彩色图像转化为灰度图像。彩色图像转化为灰度图像的方法很多，这里采用RBG转化到HSI彩色空间，采用I分量。

imgry = im.convert(‘L’)

灰度看起来是这样的

Step 3 需要把图像中的噪声去除掉。这里的图像比较简单，直接阈值化就行了。我们把大于阈值threshold的像素置为1，其他的置为0。对此，先生成一张查找表，映射过程让库函数帮我们做。

threshold = 140
table = []
for i in range(256):
if i < threshold:
table.append(0)
else:
table.append(1)

阈值为什么是140呢？试出来的，或者参考直方图。

映射过程为

out = imgry.point(table,’1′)

此时图像看起来是这样的

Step 4 把图片中的字符转化为文本。采用pytesser 中的image_to_string函数

text = image_to_string(out)

Step 5 优化。根据观察，验证码中只有数字，并且上面的文字识别程序经常把8识别为S。因此，对于识别结果，在进行一些替换操作。

#由于都是数字
#对于识别成字母的采用该表进行修正
rep={‘O’:’0′,
‘I’:’1′,’L’:’1′,
‘Z’:’2′,
‘S’:’8′
};

for r in rep:
text = text.replace(r,rep[r])

div>好了，text中为最终结果。

7025
0195
7039
6716

程序需要PIL库和pytesser库支持。

最后，整个程序看起来是这样的

import Image
import ImageEnhance
import ImageFilter
import sys
from pytesser import *

# 二值化
threshold = 140
table = []
for i in range(256):
if i < threshold:
table.append(0)
else:
table.append(1)

#由于都是数字
#对于识别成字母的采用该表进行修正
rep={‘O’:’0′,
‘I’:’1′,’L’:’1′,
‘Z’:’2′,
‘S’:’8′
};

def getverify1(name):

#打开图片
im = Image.open(name)
#转化到亮度
imgry = im.convert(‘L’)
imgry.save(‘g’+name)
#二值化
out = imgry.point(table,’1′)
out.save(‘b’+name)
#识别
text = image_to_string(out)
#识别对吗
text = text.strip()
text = text.upper();

for r in rep:
text = text.replace(r,rep[r])

#out.save(text+’.jpg’)
print text
return text
getverify1(‘v1.jpg’)
getverify1(‘v2.jpg’)
getverify1(‘v3.jpg’)
getverify1(‘v4.jpg’)

程序以及测试数据在这里

div contentScore=”411″>

下载在Linux公社的1号FTP服务器里，下载地址：

FTP地址：ftp://www.linuxidc.com

用户名：www.linuxidc.com

密码：www.muu.cc

在 2013年LinuxIDC.com1月使用Python以及工具包进行简单的验证码识别

下载方法见

About The Author

bjmayor

程序员，码农，php,python,ios,android,go，产品经理，创业。

M	T	W	T	F	S	S
« Jan
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Related Posts

About The Author

bjmayor