去掉字符串中的控制字符 & re模块中的match,search区别。-zhoubaozhou-ChinaUnix博客

文章分类

文章存档

2008年（13）

2007年（12）

我的朋友

相关博文

分类： Python/Ruby

2011-12-29 17:43:25

1) 去掉字符串中的控制字符。

在某些网页提取链接过程中，发现一些站点在链接文本中，夹杂了一些控制字符，例如回车，tab等等。

去掉很简单，直接用re模块吧。

import re

str = re.sub(r'[\x01-\x1f]','', str)

注：这个只能针对ascii编码，其他编码需要根据情况处理。

2) re.match 与 re.search

同事说某段正则不能提取，

re.match('(?P\d+)', 'xxxx123xxxx')

能看出问题在那里吗？或者看看文档就清楚了。

match(pattern, string, flags=0)

Try to apply the pattern at the start of the string, returning

a match object, or None if no match was found.

search(pattern, string, flags=0)

Scan through string looking for a match to the pattern, returning

a match object, or None if no match was found.

现在发现看那么多文章，不如老老实实看RFC，API接口之类的文档。

阅读(2415) | 评论(2) | 转发(2) |

给主人留下些什么吧！~~

3783335812011-12-30 23:03:00

Practice makes perfect!

-小Y头-2011-12-30 22:49:02

现在发现看那么多文章，不如老老实实看RFC，API接口之类的文档

感谢所有关心和支持过ChinaUnix的朋友们