show	version	enable_checker
step	1.0	true

导入 request 包

新的开始

上安装并启动了
- nginx 服务器

状态码	状态
200	Ok
304	Not modified
304	Not found

使用浏览器访问了服务器上的网页
- 浏览器发送了请求
  - request
- 服务器回应了响应
  - response

到此为止我们还没有爬虫
爬虫在哪儿呢？🤣

过程

浏览的过程是
- 在客户机上浏览器
  - 发请求 request
- 服务器接收请求
  - 返回响应 response
- 浏览器收到 response 并渲染成页面

爬虫就假装自己是一个浏览器
- 用代码发请求
- 反正都是 0101 的字节流
那我应该如何假装自己是个浏览器呢？
- 先要有一个 requests包

请求 requests

https://requests.readthedocs.io/en/latest/

先从 requests 开始
- 导入这个包

下载包

如果本地没有的话
- 要先下载

pip3 install requests

导入包

查看帮助

根据这个帮助

尝试复制

我们来试试
- 照猫画虎

照猫画虎

import requests
response = requests.get("http://localhost/")
response

找到response

好像返回了 200 的状态码
这个 response 到底是什么类型呢？

Response 响应

response
type(response)

Response 对象
- 包括了一个 http 请求的返回结果
- requests.models.Response

只读对象

help(response)

response 这个返回对象
- 里面有很多属性
- content 就是可以读出来的内容
- 返回的是字节序列 bytes

如果我们
- 只想要纯文本
- 怎么办呢？

文本

来看看 r.text

r.content 和 r.text
- 各是什么类型的呢？

对比

content 和 text
- content 是字节序列
- text 是字符串序列

明确

通过变量名前缀
- 明确变量类型

import requests
response = requests.get("http://localhost/")
b_html = response.content
s_html = response.text
print("b_html:", type(b_html), b_html)
print("s_html:", type(s_html), s_html)

结果

那这 bytes 和str之间
- 可以相互转化么？

转化

字节序列和字符串的转化
- 就是字符串的编码和解码

import requests
response = requests.get("http://localhost/")
b_html = response.content
s_html = response.text
print(b_html.decode()==s_html)
print(s_html.encode()==b_html)

结果

字节序列
- 解码 decode 之后
  - 得到字符串
字符串
- encode 编码之后
  - 得到字节序列

编码和解码
- 互为逆方法

总结

导入了 requests 模块
- requests 帮我们发请求
这样我们就可以
- 假装自己是一个浏览器
- 完成了 http get 的过程
  - 发出了 request
  - 得到了 response
  - 状态码 200
但是读到的内容是
- resposne.content 字节序列
- resposne.text 字符串序列
如何解析 html 语言呢？🤔
下次再说

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

552-192576-requests_模块.sy.md

552-192576-requests_模块.sy.md

导入 request 包

新的开始

过程

请求 requests

下载包

导入包

尝试复制

照猫画虎

Response 响应

只读对象

文本

对比

明确

转化

总结

Files

552-192576-requests_模块.sy.md

Latest commit

History

552-192576-requests_模块.sy.md

File metadata and controls

导入 request 包

新的开始

过程

请求 requests

下载包

导入包

尝试复制

照猫画虎

Response 响应

只读对象

文本

对比

明确

转化

总结