Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/zhk0603/WebCrawler
Browse files Browse the repository at this point in the history
  • Loading branch information
zhk0603 committed Dec 6, 2017
2 parents 149f96a + 79119ec commit 8a1bd8f
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 14 deletions.
19 changes: 5 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,23 +13,14 @@ WebCrawler 采用的是一个多管道、多调度器的设计与处理模型,
为了更方便维护,代码结构更简单,我们可以为每一个操作编写独立管道(*每个管道职责尽可能单一并且耦合性极低*),多个管道协同工作,最终完成一个页面的抓取工作。在实际编写爬虫中,开发者只需专注于编写业务逻辑,其余的事情框架内部已经帮你处理好了。
在 WebCrawler 里 Pipeline 有两种运行方式:

**管道链模式:**
```
graph LR
PipelineA --> PipelineB
PipelineB --> PipelineC
PipelineC --> PipelineB
PipelineB --> PipelineA
```
**管道链模式:**
![chain mode](chain.png)

链条模式类似于“搭积木”,将多个管道拼接组装在一起,管道连着管道,形成一个闭合的处理管道链。我们推荐在编写具有连续性任务爬虫的时候,采用此模式。

**并行模式:**
```
graph TB
PipelineA --> PipelineA
PipelineB --> PipelineB
PipelineN --> PipelineN
```
![chain mode](parallel.png)

并行模式,顾名思义,也就是说 N 个管道同时运行,没有了链条关系,它们通过调度器协同工作。

### 示例
Expand Down
Binary file added chain.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added parallel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 8a1bd8f

Please sign in to comment.