简单爬虫原理笔记-1021eee-ChinaUnix博客

Raven

首页　| 　博文目录　| 　关于我

1021eee

博客访问： 709230
博文数量： 108
博客积分： 10
博客等级：民兵
技术积分： 1436
用户组：普通用户
注册时间： 2012-10-31 09:49

文章分类

全部博文（108）

文章存档

2019年（16）

2015年（2）

2014年（20）

2013年（70）

我的朋友

相关博文

简单爬虫原理笔记

分类： Python/Ruby

2019-08-13 12:09:43

以下笔记来自《PYTHON网络爬虫入门到实践一书》
该文章只是做笔记与记录联系之用。
#第一步：获取页面
#!/usr/bin/python
# coding: UTF-8

#导入模块功能
import requests
import requests
from bs4 import BeautifulSoup

#给定网页链接
link = ""

#模仿成浏览器的样子
headers = {'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}

#r 是 requests 的 Response 回复对象，里面是从链接上取回来的信息
#r.text 是获取的网页代码内容
r = requests.get(link, headers = headers)
#将获取的内容打印出来
#print (r.text)

soup = BeautifulSoup(r.text, "lxml") #使用BeautifulSoup解析这段代码
title = soup.find("h1", class_="post-title").a.text.strip() #查找第一个标题
title1 = soup.find("h1", class_="post-title").a.text.strip()
print (title) #打印标题
print (title1)

#存储数据
#这种形式会存在脚本程序所在文件夹
with open('title.txt', "a+") as f:
f.write(title)
f.close()

阅读(897) | 评论(0) | 转发(0) |

上一篇：suse 11 Linux 静态路由的添加方法

下一篇：简单爬虫原理笔记第二章练习题

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6