Merge pull request #25 from holmes1412/master

Update English version README.md
sogou · Aug 16, 2020 · 7eadd33 · 7eadd33
2 parents 2611326 + 3bd34a5
commit 7eadd33
Show file tree

Hide file tree

Showing 2 changed files with 90 additions and 69 deletions.
diff --git a/README.md b/README.md
@@ -1,3 +1,5 @@
+[English version](README_en.md)
+
 ## Sogou C++ Workflow
 [![license MIT](https://img.shields.io/badge/License-Apache-yellow.svg)](https://git.sogou-inc.com/wujiaxu/Filter/blob/master/LICENSE)
 [![C++](https://img.shields.io/badge/language-c++-red.svg)](https://en.cppreference.com/)

diff --git a/README_en.md b/README_en.md
@@ -1,76 +1,52 @@
 [中文版](README.md)
 
+## Sogou C++ Workflow
 [![license MIT](https://img.shields.io/badge/License-Apache-yellow.svg)](https://git.sogou-inc.com/wujiaxu/Filter/blob/master/LICENSE)
 [![C++](https://img.shields.io/badge/language-c++-red.svg)](https://en.cppreference.com/)
 [![platform](https://img.shields.io/badge/platform-linux%20%7C%20macos-lightgrey.svg)](#%E9%A1%B9%E7%9B%AE%E7%9A%84%E4%B8%80%E4%BA%9B%E8%AE%BE%E8%AE%A1%E7%89%B9%E7%82%B9)
 
-# Sogou C++ Workflow
-#### As the backend C++ programming standard in Sogou, Workflow is an industrial-grade programming engine. 
-#### Main functions and features:
-  * An **asynchronous engine** based on **C++11** ``std::function`` which aims to solve all the **serial, parallel and asynchronous** problems.
-  * As a network framework, it is completely **protocol-agnostic** and directly facing applications.
-    * It can either be used as a Redis **client** or an Http **server**.
-    * Convenient to **customize protocols**, so you can quickly build your own RPC systems.
-      * Sogou RPC is developed based on Sogou Workflow and is open source as an independent project. The project supports srpc, brpc and thrift protocols ([benchmark](https://github.com/holmes1412/sogou-rpc-benchmark)).
-    * Support **SSL** (depends on openssl). Support **TCP, UDP, SCTP** and other common transport layer protocols. Support SSL on **SCTP**. Not support UDP server.
-  * Natively contains a variety of **common Internet protocol** implementations which are used in a unified way.
-    * Currently support **http, redis, mysql** and **kafka** protocols. You can directly access these resources or build **servers** for these protocols.
-    * Highly likely the only C++ full-featured **mysql asynchronous client** on the market.
-    * **DNS** protocol is being developing and currently we use the system library to access DNS.
-  * Powerful feature for **scheduling computing tasks**
-    * Computing task, as well as communication task, can be added into the task flow and they’re scheduled separately by their corresponding scheduler.
-    * You can use it as a parallel programming engine **without** the network features.
-    * Our biggest goal is to **maximize the performance** of every node when the calculation and communication environment is very complex.
-    * Some **common algorithm** implementations are provided, such as parallel sorting and MapReduce.
-    * In fact, all asynchronous processes (such as disk IO, GPU tasks, timers, etc.) **can be scheduled in coordination**.
-      * On the Linux system, the disk IO task is realized through the Linux underlying aio, which is extremely efficient.
-  * Support any task flow with **DAG** structure. However, in most cases, users only need **series-parallel** structure.
-  * Built-in **load balancing** and powerful **service governance** features.
-  * Easily used in conjunction with other asynchronous engines.
-  * **Streaming** communication engine is being developed.
-  * When working as a server, it supports **multi-processes** mode and supports precise **graceful restart**.
-
-#### Building
-  * Support **Linux, macOS, FreeBSD, Windows** and other systems so far. Installing **cmake** is necessary.
-    * Windows version is temporarily released as an independent [branch](https://github.com/sogou/workflow/tree/windows), which uses **iocp** as the basis for asynchronous communication and mean while, keeping the **same external interface**.
-  * As written in C/C++, it requires the users being able to proficiently use C++ programing. It **does not** rely on boost or asio, therefor the compiling speed is extremely fast.
-  * It contains a few C++11 features, so users should be able to use ``std::function`` and ``std::move``.
-  * Theoretically support all CPU architectures and can be compiled and run on **32-bit** or **64-bit arm processors**. Big endian CPU is not tested.
-  * **OpenSSL** is required. If users expect high performance of SSL, OpenSSL 1.1 or higher is strongly recommended.
-  * **No other dependencies**. Several compression libraries such as snappy and lz4  is contained by their unmodified source (required by the Kafka protocol).
-
-#### Some design features
-  * The basic usage is very simple and handy. Some features are designed to greatly reduce the difficulty of programming with general C++ projects.
-    * To **avoid** users to **derive** as much as possible, all user behaviors are wrapped with std::function, for example:
-      * the **callback** after every task ends
-      * the **algorithms** in computing tasks
-      * one server corresponds to one ``std::function``
-    * Trying to avoid complicated memory management, all tasks and frameworks are generated by **factory** classes, and their memory is **recycled automatically**. Which means,
-      * Every task is **automatically deleted after its callback**.
-      * If the users want to keep any data in the task (such as a network reply packet or the result of an algorithm), they need to use ``std::move`` to move it.
-      * We treat memory recycle as a strict and naturally logical mechanism, so we **don’t** use share_ptr.
-    * Avoid using complicated parameter configuration.
-      * Actually we have a lot of **configurable parameters**, though you can use our system **without** feeling the existence of parameters.
-      * If you have specific requirements for program behavior and resource ratio, you can definitely find the corresponding parameter configuration items in order to maximize the performance of you program.
-  * The project adopts a fully asynchronous design and is not transparent to users, which means users need to know that they are writing asynchronous programs.
-    * Thanks to the convenience brought by ``std::function`` and the automatic memory recycling mechanism, we have delicately designed **the simply possible usage of asynchrony** for users.
-    * **No** user-mode threads concepts. On the one hand, performance is considered. On the other hand, we have the concept of computing tasks (threaded tasks) scheduling.
-      * In our design, **computing** is one kind of **asynchronous task**, which has no differences from communication.
-      * Computing tasks are scheduled by **independent thread groups** according to specific algorithms, please note that they **may not** be executed **immediately**.
-      * As we have such computing tasks, user-mode threads become meaningless, and therefore users must understand asynchrony.
-    * Because of the full asynchrony, almost all core calls are **short** and **non-blocking** operations.
-      * That’s why we **don’t** recommend users to **block** their programs in callback or do some complex calculations. However, it acceptable if the logic is quite simple.
-   * Brief summary of the usage:
-    * The users can build the program just like building a **series-parallel** circuit. The circuit can be generated **at the beginning** or **dynamically generated during the program running**.
-    * **We provide various electronic components** for users. For instance, one http request, one GPU matrix multiplication, and one parallel sorting can all be understood as a electronic component.
-    * Every electronic component has its **standard input and output**. At the meantime, every electronic component can be a **complicated circuit**, which has no necessary to be perceived by the users.
-    * For example, an http request may go through **multiple asynchronous processes** such as DNS, redirect, and retry, but the entire processes is just a **component** in the perspective of the users.
-    * Users can easily **define their own** components, including algorithms and some kind of communication.
-      * To implement **stateless protocols** is extremely simple. It may be a little bit complicated when the protocol includes login, library selection, etc., at this time, you can refer to the redis implementation.
-    * Through the powerful Upstream system, complex **service governance** can be realized, such as communication node selection, load balancing, circuit breaker and recovery, master and slave, etc.
-    * **In conclusion, this is an enterprise-level, elegantly designed asynchronous framework which can cover almost all high-performance back-end service requirements.**
-
-#### Tutorials：
+As **Sogou`s C++ server engine**, workflow supports almost all **back-end C++ online services** of Sogou, including all search services, cloud input method，online advertisements, etc., handling more than **10 billion** requests every day. This is an **enterprise-level programming engine** with light and elegantly designed which can satisfy most C++ back-end development requirements.
+
+#### You can use it:
+* To quickly build an **Http server**:
+~~~cpp
+#include <stdio.h>
+#include "workflow/WFHttpServer.h"
+
+int main()
+{
+    WFHttpServer server([](WFHttpTask *task) {
+        task->get_resp()->append_output_body("<html>Hello World!</html>");
+    });
+
+    if (server.start(8888) == 0) {  // start server on port 8888
+        getchar(); // press "Enter" to end.
+        server.stop();
+    }
+
+    return 0;
+}
+~~~
+* As a **powerful asynchronous client**. Currently supports ``http``, ``redis``, ``mysql`` and ``kafka`` protocols.
+* To realize **user-defined protocol client/server** and build your own **RPC system**.
+  * Sogou RPC is based on it and open source as an independent project, which supports srpc, brpc and thrift protocol ([benchmark](https://github.com/holmes1412/sogou-rpc-benchmark)).
+* To build **asynchronous task flow**, support common **series** and **parallel** structures, and also support more complex **DAG** structures.
+* As a **parallel programming tool**. In addition to **network tasks**, we also include **the scheduling of computing tasks**. All types of tasks can be put into **the same** task flow.
+* As a **file asynchronous IO tool** under ``Linux`` system, with a high performance exceeding any system call. Disk IO is also a task.
+* To realize any **high-performance** and **high-concurrency** back-end service with a very complex relationship between computing and communication.
+* To build a **service mesh** system.
+  * The project has built-in **service governance** and **load balancing** features.
+
+#### Compile and run environment
+
+* This project supports ``Linux``, ``macOS``, ``Windows`` and other operating systems.
+  * ``Windows`` version is temporarily released as an independent branch, using ``iocp`` to implement asynchronous networking. All user interfaces are consistent with the ``Linux`` version.
+* Supports all CPU platforms, including 32 or 64-bit ``x86`` processors, big-endian or little-endian ``arm`` processors.
+* Relies on ``OpenSSL``, recommending ``OpenSSL 1.1`` and above.
+* Uses the ``C++11`` standard and therefore, needs to be compiled with a compiler which supports ``C++11``. Does not rely on ``boost`` or ``asio``.
+* No other dependencies. However, it contains the unmodified source code of several compression libraries such as ``lz4``, ``zstd`` and ``snappy`` (required by the ``Kafka`` protocol).
+
+# Try it!
   * Client
     * [Create your first task：wget](docs/tutorial-01-wget.md)
     * [Implement redis set and get：redis_cli](docs/tutorial-02-redis_cli.md)
@@ -83,6 +59,7 @@
   * Important topics
     * [About error](docs/about-error.md)
     * [About timeout](docs/about-timeout.md)
+    * [About global configuration](docs/about-config.md)
     * [About DNS](docs/about-dns.md)
     * [About exit](docs/about-exit.md)
   * Computing tasks
@@ -103,10 +80,52 @@
   * Built-in protocols
     * [Asynchronous MySQL client：mysql_cli](docs/tutorial-12-mysql_cli.md)
 
+#### System design features
+
+We believe that a typical back-end program consists of the following three parts and should be developed completely independently.
+* Protocol
+  * In most cases, users use built-in common network protocols, such as http, redis or various rpc.
+  * Users can also easily customize user-defined network protocol,  at the mean time they only need to provide serialization and deserialization functions to define their own client/server.
+* Algorithm
+  * In our design, algorithm is a symmetrical concept with protocol.
+    * If protocol call is rpc, then algorithm call is an apc (Async Procedure Call).
+  * We have provided some general algorithms, such as sort, merge, psort, reduce, which can be used directly.
+  * Compared with user-defined protocol, user-defined algorithm is much more common. Any complex calculation with clear boundaries should be packaged into an algorithm.
+* Task flow
+  * Task flow is the actual bussiness logic, which is to put the protocols and algorithms into the flow graph for use.
+  * The typical task flow is a closed series-parallel graph. Complex business logic may be a non-closed DAG.
+  * The task flow graph can be constructed directly or dynamically generated based on the results of each step. All tasks are executed asynchronously.
+
+Basic task, task factory and complex task
+* Our system contains six basic tasks: communication, file IO, CPU, GPU, timer, and counter.
+* All tasks are generated by the task factory and automatically recycled after callback.
+  * Server task is one kind of special communication task, generated by the framework which calls the task factory, and handed over to the user through the process function.
+  * In most cases, the task generated by the user through the task factory is a complex task, which has no necessary to be perceived by the user.
+  * For example, an Http request may include many asynchronous processes (DNS, redirection), but for the user, it is just a communication task.
+  * File sorting seems to be an algorithm, but it actually includes many complex interaction processes between file IO and CPU calculation.
+  * If you think of business logic as building circuits with well-designed electronic components, then each electronic component may be a complex circuit.
+
+Asynchrony and encapsulation based on ``C++11 std::function``
+
+* Not based on user mode coroutines. Users need to know that they are writing asynchronous programs.
+* All calls are executed asynchronously, and there are almost no operations to wait for threads.
+  * Although we also provide some convenient semi-synchronous interfaces, they are not core features.
+* Please avoid derivation.Try to encapsulate user behavior with ``std::function`` instead, including:
+  * The callback of any task.
+  * Any server process. This conforms to the ``FaaS`` (Function as a Service) idea.
+  * The realization of an algorithm is simply a ``std::function``. But the algorithm can also be implemented by derivative.
+
+Memory reclamation mechanism
+* Every task will be automatically reclaimed after the callback. If a task is created but does not want to run, the user needs to release it through the dismiss method.
+* Any data in the task, such as the response of the network request, will also be recycled with the task. At this time, the user can use ``std::move()`` to move the required data.
+* SeriesWork and ParallelWork are two kinds of framework objects, which are also recycled after their callback.
+* This project doesn’t use ``std::shared_ptr`` to manage memory.
+
+#### More design documents
+To be continued...
+
 ## Authors
 
 * **Xie Han** - *[[email protected]](mailto:[email protected])*
 * **Wu Jiaxu** - *[[email protected]](mailto:[email protected])*
 * **Li Yingxin** - *[[email protected]](mailto:[email protected])*
-
-