Chinaunix首页 | 论坛 | 博客
  • 博客访问: 112557
  • 博文数量: 49
  • 博客积分: 2612
  • 博客等级: 少校
  • 技术积分: 431
  • 用 户 组: 普通用户
  • 注册时间: 2009-12-01 14:31
个人简介

来来去去

文章分类

全部博文(49)

文章存档

2015年(1)

2012年(4)

2011年(1)

2010年(42)

2009年(1)

我的朋友

分类: Python/Ruby

2010-05-24 17:28:36

'''
Function: get_links(url)
  parameter: url
  urlparse.urlparse: parse a url into six compents, returing a 6-tuple (scheme,netloc,path,params,query and fragment),please frefer to for more info about urlparse.
  HTTPConnection.request(method, url[, body[, headers]])
'
''
import urllib, urllister
import urlparse
import httplib
import time
import urllib2

def get_links(url):
    usock=urllib.urlopen(url)
    parser=urllister.URLLister() #Create a instance
    parser.feed(usock.read()) #Put the resource(html) into parser,and get the relevent segments from the resource.
    usock.close()
    parser.close()
    uhost=urlparse.urlparse(url)
    for url in parser.urls:
        print url
        up=urlparse.urlparse(url)

        if up.netloc=="":  #Some link may not contain 'http:'(called absolute path')
            conn=httplib.HTTPConnection(uhost.netloc)
            conn.request("GET","/"+up.path+"?"+up.params+up.query+up.fragment)
            res=conn.getresponse()
            status=res.status
            reason=res.reason
            #data=res.read()
            conn.close()
        else:
            conn=httplib.HTTPConnection(uhost.netloc)
            conn.request("GET",up.path+"?"+up.params+up.query+up.fragment)
            res=conn.getresponse()
            status=res.status
            reason=res.reason
            #data=res.read()
            conn.close()

        print url,status,reason


if __name__ == '__main__':
    url=raw_input("Please enter the url you want to check:\n")
    get_links(url)

This programe can be used to check the status of the links in the given web site. it will return the status of each links.
 
 
urllister.py
 

from sgmllib import SGMLParser

class URLLister(SGMLParser):
    def reset(self):
        SGMLParser.reset(self)
        self.urls=[]

    def start_a(self,attars):
        href=[v for k,v in attars if k=='href']
        if href:
            self.urls.extend(href)


阅读(449) | 评论(0) | 转发(0) |
0

上一篇:登入 人人网

下一篇:python 命令行参数

给主人留下些什么吧!~~