Skip to content

Commit

Permalink
Merge pull request #25 from holmes1412/master
Browse files Browse the repository at this point in the history
Update English version README.md
  • Loading branch information
Barenboim committed Aug 16, 2020
2 parents 2611326 + 3bd34a5 commit 7eadd33
Show file tree
Hide file tree
Showing 2 changed files with 90 additions and 69 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
[English version](README_en.md)

## Sogou C++ Workflow
[![license MIT](https://img.shields.io/badge/License-Apache-yellow.svg)](https://git.sogou-inc.com/wujiaxu/Filter/blob/master/LICENSE)
[![C++](https://img.shields.io/badge/language-c++-red.svg)](https://en.cppreference.com/)
Expand Down
157 changes: 88 additions & 69 deletions README_en.md
Original file line number Diff line number Diff line change
@@ -1,76 +1,52 @@
[中文版](README.md)

## Sogou C++ Workflow
[![license MIT](https://img.shields.io/badge/License-Apache-yellow.svg)](https://git.sogou-inc.com/wujiaxu/Filter/blob/master/LICENSE)
[![C++](https://img.shields.io/badge/language-c++-red.svg)](https://en.cppreference.com/)
[![platform](https://img.shields.io/badge/platform-linux%20%7C%20macos-lightgrey.svg)](#%E9%A1%B9%E7%9B%AE%E7%9A%84%E4%B8%80%E4%BA%9B%E8%AE%BE%E8%AE%A1%E7%89%B9%E7%82%B9)

# Sogou C++ Workflow
#### As the backend C++ programming standard in Sogou, Workflow is an industrial-grade programming engine.
#### Main functions and features:
* An **asynchronous engine** based on **C++11** ``std::function`` which aims to solve all the **serial, parallel and asynchronous** problems.
* As a network framework, it is completely **protocol-agnostic** and directly facing applications.
* It can either be used as a Redis **client** or an Http **server**.
* Convenient to **customize protocols**, so you can quickly build your own RPC systems.
* Sogou RPC is developed based on Sogou Workflow and is open source as an independent project. The project supports srpc, brpc and thrift protocols ([benchmark](https://github.com/holmes1412/sogou-rpc-benchmark)).
* Support **SSL** (depends on openssl). Support **TCP, UDP, SCTP** and other common transport layer protocols. Support SSL on **SCTP**. Not support UDP server.
* Natively contains a variety of **common Internet protocol** implementations which are used in a unified way.
* Currently support **http, redis, mysql** and **kafka** protocols. You can directly access these resources or build **servers** for these protocols.
* Highly likely the only C++ full-featured **mysql asynchronous client** on the market.
* **DNS** protocol is being developing and currently we use the system library to access DNS.
* Powerful feature for **scheduling computing tasks**
* Computing task, as well as communication task, can be added into the task flow and they’re scheduled separately by their corresponding scheduler.
* You can use it as a parallel programming engine **without** the network features.
* Our biggest goal is to **maximize the performance** of every node when the calculation and communication environment is very complex.
* Some **common algorithm** implementations are provided, such as parallel sorting and MapReduce.
* In fact, all asynchronous processes (such as disk IO, GPU tasks, timers, etc.) **can be scheduled in coordination**.
* On the Linux system, the disk IO task is realized through the Linux underlying aio, which is extremely efficient.
* Support any task flow with **DAG** structure. However, in most cases, users only need **series-parallel** structure.
* Built-in **load balancing** and powerful **service governance** features.
* Easily used in conjunction with other asynchronous engines.
* **Streaming** communication engine is being developed.
* When working as a server, it supports **multi-processes** mode and supports precise **graceful restart**.

#### Building
* Support **Linux, macOS, FreeBSD, Windows** and other systems so far. Installing **cmake** is necessary.
* Windows version is temporarily released as an independent [branch](https://github.com/sogou/workflow/tree/windows), which uses **iocp** as the basis for asynchronous communication and mean while, keeping the **same external interface**.
* As written in C/C++, it requires the users being able to proficiently use C++ programing. It **does not** rely on boost or asio, therefor the compiling speed is extremely fast.
* It contains a few C++11 features, so users should be able to use ``std::function`` and ``std::move``.
* Theoretically support all CPU architectures and can be compiled and run on **32-bit** or **64-bit arm processors**. Big endian CPU is not tested.
* **OpenSSL** is required. If users expect high performance of SSL, OpenSSL 1.1 or higher is strongly recommended.
* **No other dependencies**. Several compression libraries such as snappy and lz4 is contained by their unmodified source (required by the Kafka protocol).

#### Some design features
* The basic usage is very simple and handy. Some features are designed to greatly reduce the difficulty of programming with general C++ projects.
* To **avoid** users to **derive** as much as possible, all user behaviors are wrapped with std::function, for example:
* the **callback** after every task ends
* the **algorithms** in computing tasks
* one server corresponds to one ``std::function``
* Trying to avoid complicated memory management, all tasks and frameworks are generated by **factory** classes, and their memory is **recycled automatically**. Which means,
* Every task is **automatically deleted after its callback**.
* If the users want to keep any data in the task (such as a network reply packet or the result of an algorithm), they need to use ``std::move`` to move it.
* We treat memory recycle as a strict and naturally logical mechanism, so we **don’t** use share_ptr.
* Avoid using complicated parameter configuration.
* Actually we have a lot of **configurable parameters**, though you can use our system **without** feeling the existence of parameters.
* If you have specific requirements for program behavior and resource ratio, you can definitely find the corresponding parameter configuration items in order to maximize the performance of you program.
* The project adopts a fully asynchronous design and is not transparent to users, which means users need to know that they are writing asynchronous programs.
* Thanks to the convenience brought by ``std::function`` and the automatic memory recycling mechanism, we have delicately designed **the simply possible usage of asynchrony** for users.
* **No** user-mode threads concepts. On the one hand, performance is considered. On the other hand, we have the concept of computing tasks (threaded tasks) scheduling.
* In our design, **computing** is one kind of **asynchronous task**, which has no differences from communication.
* Computing tasks are scheduled by **independent thread groups** according to specific algorithms, please note that they **may not** be executed **immediately**.
* As we have such computing tasks, user-mode threads become meaningless, and therefore users must understand asynchrony.
* Because of the full asynchrony, almost all core calls are **short** and **non-blocking** operations.
* That’s why we **don’t** recommend users to **block** their programs in callback or do some complex calculations. However, it acceptable if the logic is quite simple.
* Brief summary of the usage:
* The users can build the program just like building a **series-parallel** circuit. The circuit can be generated **at the beginning** or **dynamically generated during the program running**.
* **We provide various electronic components** for users. For instance, one http request, one GPU matrix multiplication, and one parallel sorting can all be understood as a electronic component.
* Every electronic component has its **standard input and output**. At the meantime, every electronic component can be a **complicated circuit**, which has no necessary to be perceived by the users.
* For example, an http request may go through **multiple asynchronous processes** such as DNS, redirect, and retry, but the entire processes is just a **component** in the perspective of the users.
* Users can easily **define their own** components, including algorithms and some kind of communication.
* To implement **stateless protocols** is extremely simple. It may be a little bit complicated when the protocol includes login, library selection, etc., at this time, you can refer to the redis implementation.
* Through the powerful Upstream system, complex **service governance** can be realized, such as communication node selection, load balancing, circuit breaker and recovery, master and slave, etc.
* **In conclusion, this is an enterprise-level, elegantly designed asynchronous framework which can cover almost all high-performance back-end service requirements.**

#### Tutorials:
As **Sogou`s C++ server engine**, workflow supports almost all **back-end C++ online services** of Sogou, including all search services, cloud input method,online advertisements, etc., handling more than **10 billion** requests every day. This is an **enterprise-level programming engine** with light and elegantly designed which can satisfy most C++ back-end development requirements.

#### You can use it:
* To quickly build an **Http server**:
~~~cpp
#include <stdio.h>
#include "workflow/WFHttpServer.h"

int main()
{
WFHttpServer server([](WFHttpTask *task) {
task->get_resp()->append_output_body("<html>Hello World!</html>");
});

if (server.start(8888) == 0) { // start server on port 8888
getchar(); // press "Enter" to end.
server.stop();
}

return 0;
}
~~~
* As a **powerful asynchronous client**. Currently supports ``http``, ``redis``, ``mysql`` and ``kafka`` protocols.
* To realize **user-defined protocol client/server** and build your own **RPC system**.
* Sogou RPC is based on it and open source as an independent project, which supports srpc, brpc and thrift protocol ([benchmark](https://github.com/holmes1412/sogou-rpc-benchmark)).
* To build **asynchronous task flow**, support common **series** and **parallel** structures, and also support more complex **DAG** structures.
* As a **parallel programming tool**. In addition to **network tasks**, we also include **the scheduling of computing tasks**. All types of tasks can be put into **the same** task flow.
* As a **file asynchronous IO tool** under ``Linux`` system, with a high performance exceeding any system call. Disk IO is also a task.
* To realize any **high-performance** and **high-concurrency** back-end service with a very complex relationship between computing and communication.
* To build a **service mesh** system.
* The project has built-in **service governance** and **load balancing** features.

#### Compile and run environment

* This project supports ``Linux``, ``macOS``, ``Windows`` and other operating systems.
* ``Windows`` version is temporarily released as an independent branch, using ``iocp`` to implement asynchronous networking. All user interfaces are consistent with the ``Linux`` version.
* Supports all CPU platforms, including 32 or 64-bit ``x86`` processors, big-endian or little-endian ``arm`` processors.
* Relies on ``OpenSSL``, recommending ``OpenSSL 1.1`` and above.
* Uses the ``C++11`` standard and therefore, needs to be compiled with a compiler which supports ``C++11``. Does not rely on ``boost`` or ``asio``.
* No other dependencies. However, it contains the unmodified source code of several compression libraries such as ``lz4``, ``zstd`` and ``snappy`` (required by the ``Kafka`` protocol).

# Try it!
* Client
* [Create your first task:wget](docs/tutorial-01-wget.md)
* [Implement redis set and get:redis_cli](docs/tutorial-02-redis_cli.md)
Expand All @@ -83,6 +59,7 @@
* Important topics
* [About error](docs/about-error.md)
* [About timeout](docs/about-timeout.md)
* [About global configuration](docs/about-config.md)
* [About DNS](docs/about-dns.md)
* [About exit](docs/about-exit.md)
* Computing tasks
Expand All @@ -103,10 +80,52 @@
* Built-in protocols
* [Asynchronous MySQL client:mysql_cli](docs/tutorial-12-mysql_cli.md)

#### System design features

We believe that a typical back-end program consists of the following three parts and should be developed completely independently.
* Protocol
* In most cases, users use built-in common network protocols, such as http, redis or various rpc.
* Users can also easily customize user-defined network protocol, at the mean time they only need to provide serialization and deserialization functions to define their own client/server.
* Algorithm
* In our design, algorithm is a symmetrical concept with protocol.
* If protocol call is rpc, then algorithm call is an apc (Async Procedure Call).
* We have provided some general algorithms, such as sort, merge, psort, reduce, which can be used directly.
* Compared with user-defined protocol, user-defined algorithm is much more common. Any complex calculation with clear boundaries should be packaged into an algorithm.
* Task flow
* Task flow is the actual bussiness logic, which is to put the protocols and algorithms into the flow graph for use.
* The typical task flow is a closed series-parallel graph. Complex business logic may be a non-closed DAG.
* The task flow graph can be constructed directly or dynamically generated based on the results of each step. All tasks are executed asynchronously.

Basic task, task factory and complex task
* Our system contains six basic tasks: communication, file IO, CPU, GPU, timer, and counter.
* All tasks are generated by the task factory and automatically recycled after callback.
* Server task is one kind of special communication task, generated by the framework which calls the task factory, and handed over to the user through the process function.
* In most cases, the task generated by the user through the task factory is a complex task, which has no necessary to be perceived by the user.
* For example, an Http request may include many asynchronous processes (DNS, redirection), but for the user, it is just a communication task.
* File sorting seems to be an algorithm, but it actually includes many complex interaction processes between file IO and CPU calculation.
* If you think of business logic as building circuits with well-designed electronic components, then each electronic component may be a complex circuit.

Asynchrony and encapsulation based on ``C++11 std::function``

* Not based on user mode coroutines. Users need to know that they are writing asynchronous programs.
* All calls are executed asynchronously, and there are almost no operations to wait for threads.
* Although we also provide some convenient semi-synchronous interfaces, they are not core features.
* Please avoid derivation.Try to encapsulate user behavior with ``std::function`` instead, including:
* The callback of any task.
* Any server process. This conforms to the ``FaaS`` (Function as a Service) idea.
* The realization of an algorithm is simply a ``std::function``. But the algorithm can also be implemented by derivative.

Memory reclamation mechanism
* Every task will be automatically reclaimed after the callback. If a task is created but does not want to run, the user needs to release it through the dismiss method.
* Any data in the task, such as the response of the network request, will also be recycled with the task. At this time, the user can use ``std::move()`` to move the required data.
* SeriesWork and ParallelWork are two kinds of framework objects, which are also recycled after their callback.
* This project doesn’t use ``std::shared_ptr`` to manage memory.

#### More design documents
To be continued...

## Authors

* **Xie Han** - *[[email protected]](mailto:[email protected])*
* **Wu Jiaxu** - *[[email protected]](mailto:[email protected])*
* **Li Yingxin** - *[[email protected]](mailto:[email protected])*


0 comments on commit 7eadd33

Please sign in to comment.