1.安装selenium package:
-
sudo pip install -U selenium
如果没有pip,先安装pip:
-
sudo python setup.py install
2.引入selenium package, 建立webdriver对象:
-
from selenium import webdriver
-
-
-
sel = selenium.webdriver.Chrome()
在这一步,可能会提示chrome path 的错误,这是因为操作chrome浏览器需要有ChromeDriver的驱动来协助,驱动下载地址:
.下载相应版本,并解压到目录
3。打开设定的url,并等待response:
-
loginurl = ''
-
#open the login in page
-
sel.get(loginurl)
-
time.sleep(10)
4.通过xpath找到登录框,并填入相应帐号密码,模拟点击登录:
-
#sign in the username
-
try:
-
sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[1]/div[1]/input").send_keys('yourusername')
-
print 'user success!'
-
except:
-
print 'user error!'
-
time.sleep(1)
-
#sign in the pasword
-
try:
-
sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[2]/div[1]/input").send_keys('yourPW')
-
print 'pw success!'
-
except:
-
print 'pw error!'
-
time.sleep(1)
-
#click to login
-
try:
-
sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click()
-
print 'click success!'
-
except:
-
print 'click error!'
-
time.sleep(3)
5。验证登录成功与否,若currenturl发生变化,则认为登录成功:
-
curpage_url = sel.current_url
-
print curpage_url
-
while(curpage_url == loginurl):
-
#print 'please input the verify code:'
-
print 'please input the verify code:'
-
verifycode = sys.stdin.readline()
-
sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[3]/div[1]/input").send_keys(verifycode)
-
try:
-
sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click()
-
print 'click success!'
-
except:
-
print 'click error!'
-
time.sleep(3)
-
curpage_url = sel.current_url
6。通过对象的方法获取当前访问网站的session cookie:
-
#get the session cookie
-
cookie = [item["name"] + "=" + item["value"] for item in sel.get_cookies()]
-
#print cookie
-
-
cookiestr = ';'.join(item for item in cookie)
-
print cookiestr
7.得到cookie之后,就可以通过urllib2访问相应的网站,并可实现网页爬取等工作:
-
import urllib2
-
-
-
print '%%%using the urllib2 !!'
-
homeurl = sel.current_url
-
print 'homeurl: %s' % homeurl
-
headers = {'cookie':cookiestr}
-
req = urllib2.Request(homeurl, headers = headers)
-
try:
-
response = urllib2.urlopen(req)
-
text = response.read()
-
fd = open('homepage', 'w')
-
fd.write(text)
-
fd.close()
-
print '###get home page html success!!'
-
except:
-
print '### get home page html error!!'
参考链接:
http://www.testwo.com/blog/6931
阅读(3843) | 评论(0) | 转发(0) |