Chinaunix首页 | 论坛 | 博客
  • 博客访问: 291836
  • 博文数量: 82
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 874
  • 用 户 组: 普通用户
  • 注册时间: 2015-03-21 09:58
个人简介

traveling in cumputer science!!

文章分类

全部博文(82)

文章存档

2016年(13)

2015年(69)

我的朋友

分类: Python/Ruby

2015-12-06 18:32:44

1.安装selenium package:

点击(此处)折叠或打开

  1. sudo pip install -U selenium
    如果没有pip,先安装pip:

点击(此处)折叠或打开

  1. sudo python setup.py install

2.引入selenium package, 建立webdriver对象:

点击(此处)折叠或打开

  1. from selenium import webdriver


  2. sel = selenium.webdriver.Chrome()
    在这一步,可能会提示chrome path 的错误,这是因为操作chrome浏览器需要有ChromeDriver的驱动来协助,驱动下载地址:
     
    .下载相应版本,并解压到目录

点击(此处)折叠或打开

  1. /usr/bin
3。打开设定的url,并等待response:

点击(此处)折叠或打开

  1. loginurl = ''
  2. #open the login in page
  3. sel.get(loginurl)
  4. time.sleep(10)
4.通过xpath找到登录框,并填入相应帐号密码,模拟点击登录:

点击(此处)折叠或打开

  1. #sign in the username
  2. try:
  3.     sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[1]/div[1]/input").send_keys('yourusername')
  4.     print 'user success!'
  5. except:
  6.     print 'user error!'
  7. time.sleep(1)
  8. #sign in the pasword
  9. try:
  10.     sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[2]/div[1]/input").send_keys('yourPW')
  11.     print 'pw success!'
  12. except:
  13.     print 'pw error!'
  14. time.sleep(1)
  15. #click to login
  16. try:
  17.     sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click()
  18.     print 'click success!'
  19. except:
  20.     print 'click error!'
  21. time.sleep(3)
5。验证登录成功与否,若currenturl发生变化,则认为登录成功:

点击(此处)折叠或打开

  1. curpage_url = sel.current_url
  2. print curpage_url
  3. while(curpage_url == loginurl):
  4.     #print 'please input the verify code:'
  5.     print 'please input the verify code:'
  6.     verifycode = sys.stdin.readline()
  7.     sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[3]/div[1]/input").send_keys(verifycode)
  8.     try:
  9.         sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click()
  10.         print 'click success!'
  11.     except:
  12.          print 'click error!'
  13.     time.sleep(3)
  14.     curpage_url = sel.current_url
6。通过对象的方法获取当前访问网站的session  cookie:

点击(此处)折叠或打开

  1. #get the session cookie
  2. cookie = [item["name"] + "=" + item["value"] for item in sel.get_cookies()]
  3. #print cookie

  4. cookiestr = ';'.join(item for item in cookie)
  5. print cookiestr

7.得到cookie之后,就可以通过urllib2访问相应的网站,并可实现网页爬取等工作:

点击(此处)折叠或打开

  1. import urllib2


  2. print '%%%using the urllib2 !!'
  3. homeurl = sel.current_url
  4. print 'homeurl: %s' % homeurl
  5. headers = {'cookie':cookiestr}
  6. req = urllib2.Request(homeurl, headers = headers)
  7. try:
  8.     response = urllib2.urlopen(req)
  9.     text = response.read()
  10.     fd = open('homepage', 'w')
  11.     fd.write(text)
  12.     fd.close()
  13.     print '###get home page html success!!'
  14. except:
  15.     print '### get home page html error!!'

参考链接:

http://www.testwo.com/blog/6931


阅读(3843) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~