Chinaunix首页 | 论坛 | 博客
  • 博客访问: 101480
  • 博文数量: 20
  • 博客积分: 648
  • 博客等级: 上士
  • 技术积分: 222
  • 用 户 组: 普通用户
  • 注册时间: 2010-10-02 11:43
文章分类

全部博文(20)

文章存档

2013年(3)

2012年(8)

2011年(7)

2010年(2)

我的朋友

分类: Python/Ruby

2012-12-23 00:09:49

本代码采用关键字匹配的方法,过滤出国内主流的浏览器以及对应的内核、操作系统以及硬件类型(主要针对Android手机)

说明:由于好多浏览器试图去兼容其他类型的浏览器,所以会在UA中写血多其他兼容的浏览器信息,
所以本代码中对这种类型的浏览器做了一下判断,选择了最外层的浏览器:
比如QQBrowser使用IE内核;而Maxthon兼容Chrome,Chrome兼容safari;
内核也是如此。

点击(此处)折叠或打开

  1. class UAParse:
  2.         def __init__(self):
  3.                 self.keywordsmap={'TencentTraveler':1,'QQBrowser':2,'Maxthon':3,'BIDUBrowser':4,'360SE':5,'TheWorld':6,'qihu theworld':7,'SE 2.X':8
  4.                         ,'Firefox':9,'Safari':10,'Chrome':11,'MSIE':12,'Opera':13
  5.                         ,'BdMobile':14,'MQQBrowser':15,'UCWEB':16,'NokiaBrowser':17,'UC':18
  6.                         ,'Iceweasel':19,'Mobile':20,'K-MeleonCCFME':21
  7.                         # kernel
  8.                         ,'Trident':101,'AppleWebKit':102,'Presto':103,'Gecko':104,'KHTML':105
  9.                         #OS
  10.                         ,'SymbianOS':201,'Mac OS X':202,'Android':203,'Windows NT':204,'Linux':205,
  11.                 }
  12.                 self.pattern='(%s)([0-9/. ]*)'%('|'.join(self.keywordsmap.keys()))
  13.                 self.cpat=re.compile(self.pattern)

  14.         def uaparse(self,useragent):
  15.                 browser=''
  16.                 print useragent
  17.                 midx=[100,200,300]
  18.                 info={'brw':'','brwv':'','knl':'','knlv':'','os':'','osv':'','hard':''}
  19.                 for m in self.cpat.finditer(useragent):
  20.                         if not m.group(1):
  21.                                 continue
  22.                         cidx=self.keywordsmap[m.group(1)]
  23.                         if cidx<100:
  24.                                 if midx[0]>cidx:
  25.                                         midx[0]=cidx
  26.                                         info['brw']=m.group(1)
  27.                                         info['brwv']=m.group(2)
  28.                         elif cidx<200:
  29.                                 if midx[1]>cidx:
  30.                                         midx[1]=cidx
  31.                                         info['knl']=m.group(1)
  32.                                         info['knlv']=m.group(2)
  33.                         elif cidx<300:
  34.                                 if midx[2]>cidx:
  35.                                         midx[2]=cidx
  36.                                         info['os']=m.group(1)
  37.                                         info['osv']=m.group(2)
  38.                                         if info['os']=='Android':
  39.                                                 pos=useragent[m.end(2):].find(' Build')
  40.                                                 if pos>0:
  41.                                                         hard = re.search('([0-9a-zA-Z_-]+)( Build/)',useragent[useragent.rfind(';'):])
  42.                                                         if hard:
  43.                                                                 info['hard']=hard.group(1)
  44.                                                 else:
  45.                                                         hard = re.search('; ([0-9a-zA-Z_ -]+)([0-9a-zA-Z_ /-]+\))',useragent[useragent.rfind(';'):])
  46.                                                         if hard:
  47.                                                                 info['hard']=hard.group(1)
  48.                                         elif info['os']=='Mac OS X':
  49.                                                 hard = re.search('\( *([a-zA-Z]+) *;',useragent[:m.start(1)])
  50.                                                 if hard:
  51.                                                         info['hard']=hard.group(1)
  52.                                                 version = re.search('OS ([0-9_]+)',useragent[:m.start(1)])
  53.                                                 if version:
  54.                                                         info['osv'] = version.group(1)
  55.                                         elif info['os']=='Linux':
  56.                                                 version = re.search('(Linux) ([0-9a-zA-Z_-]+)',useragent[m.start(1):])
  57.                                                 if version:
  58.                                                         info['osv'] = version.group(2)

  59.                 info['brwv']=info['brwv'].strip(' /')
  60.                 info['knlv']=info['knlv'].strip(' /')
  61.                 info['osv']=info['osv'].strip(' /')
  62.                 #print info
  63.                 print '%s=>%s\t%s=>%s\t%s=>%s\t%s'%(info['brw'],info['brwv'],info['knl'],info['knlv'],info['os'],info['osv'],info['hard'])


  64. def test():
  65.         obj = UAParse()
  66.         obj.uaparse('Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.12 (KHTML, like Gecko) Maxthon/3.4.1.1000 Chrome/18.0.966.0 Safari/535.12')
  67.         obj.uaparse('MQQBrowser/3.7/Mozilla/5.0 (Linux; U; Android 2.3.5; zh-cn; GT-N7000 Build/GINGERBREAD) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1')
  68.         obj.uaparse('MQQBrowser/2.7 Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Mobile/8L1 Safari/7534.48.3')
  69.         obj.uaparse('Mozilla/5.0 (X11; U; Linux mips64; zh-CN; rv:1.9.0.11) Gecko/2009061212 Iceweasel/3.0.6 (Debian-3.0.6-1)')

阅读(5450) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~