System Design

Must Learn System Design Concepts

Consistent Hashing
CAP Theorem
Load Balancing
Caching
Data Partitioning
Indexes
Proxies
Queues
Replication
SQL vs. NoSQL.
Latency vs Throughput

Important System Design Algorithms To Read Before Interview

Trie algorithm
Reverse index
Frugal Streaming
Geo-hashing
Leaky bucket, Token bucket and its variation
Bloom Filters
Operational transformation
Quadtree / Rtree
Loose Counting
XMPP / Web Sockets uses
HTTP Long polling
Consistent Hashing

Credit - Soumyajit Bhattacharyay Post

Bits And Bytes For Any System

ACID Transaction, ACL, AES, Alerting, Apache Kafka, Asymmetric Encryption, Availability, Availability Zone, Blob Storage, Cache, Cache Eviction Policy, Cache Hit, Cache Miss, CAP Theorem, Certificate Authority, Client, Client-Server Model, Cloud Pub-Sub, Configuration, Consensus Algorithm, Consistent Hashing, Concurrency, Content Delivery network, Cypher, Database Index, Database Lock, Databases, DDoS Attack, Disk, Distributed File System, DNS, DOS Attack, Etcd, Eventual Consistency, File System, Forward Proxy, Google Cloud Storage, Gossip Protocol, Graph Database, Hadoop, Hashing Function, High Availability, Horizontal Scaling, Hot Spot, HTTP, HTTPS, Idempotent Operation, InfluxDB, IP, IP Address, IP Packet, JSON, Key-Value Store, Latency, Leader Election, Load Balancer, Logging, Man In The Middle Attack, Map Reduce, Memory, Microservice Architecture, MongoDB, Monitoring, Monolithic Architecture, MySQL, Neo4j, Nginx, Nines, Node/Instance/Host, Non-Relational Database, NoSQL Database, Pagination, Paxos & Raft, Zookeeper, YAML, Worker Pool Pattern, Virtual Machine, Vertical Scaling, TLS Handshake, TLS, Time Series Database, Throughput,TCP, Symmetric Encryption, Strong Consistency, Streaming, Stateless, Stateful, SSL Certificate, SQL Database, SQL, Spatial Database, Socket, SLO, SLA, Sharding, SHA, Server-Selection Strategy, Server, S3, Reverse Proxy, Replication, Rendezvous Hashing, Redundancy, Redis, Rate Limiting, Quadtree, Publish/Subscribe Pattern, Prometheus, Process, Postgres, Port, Polling, Persistent Storage, Percentiles, Peer-To-Peer Network

System Design Terminologies

Scalability, Availability, Efficiency, Reliability, Serviceability, Manageability, Extensible, Client-Server, Protocol, Proxy, Gateway, DNS, Latency, Throughput, Read Heavy, Write Heavy,

Best System Design Interview Format Or Flow

Clarifying Questions
Functional Requirements
Non Functional Requirements
Capacity/Storage Estimation
Drawing Constraints
Design Database Schema
Design APIs
Basic Architecture And Data Flow
Components or Functionality Implementation
Trade offs And Corner Cases

Some Capacity Estimation

Component Or Platform	Unit Per Second	Unit Per Day
Server	1000 Request
Google	3M Search Query	5B
Twitter	10K+ Tweets
Internet Traffic	0.11M GB
Youtube Upload

See What's happening at every second in the world at these giant platforms

Availability Industry Measurements Cheatsheet

Availability level	Downtime per year	Downtime per quarter	Downtime per month	Downtime per week	Downtime per day	Downtime per hour
90% ("one nine")	36.5 days	9 days	3 days	16.8 hours	2.4 hours	6 minutes
95%	18.25 days	4.5 days	1.5 days	8.4 hours	1.2 hours	3 minutes
99% ("two-nines")	3.65 days	21.6 hours	7.2 hours	1.68 hours	14.4 minutes	36 seconds
99.50%	1.83 days	10.8 hours	3.6 hours	50.4 minutes	7.20 minutes	18 seconds
99.9%("three nines")	8.76 hours	2.16 hours	43.2 minutes	10.1 minutes	1.44 minutes	3.6 seconds
99.95%	4.38 hours	1.08 hours	21.6 minutes	5.04 minutes	43.2 seconds	1.8 seconds
99.99%("four nines")	52.6 minutes	12.96 minutes	4.32 minutes	60.5 seconds	8.64 seconds	0.36 seconds
99.999%("five nines")	5.26 minutes	1.30 minutes	25.9 seconds	6.05 seconds	0.87 seconds	0.04 seconds

Concept Technology Implementation

System Design Concept	Technology Used To Implement	Others
Cache	Redis, Daynamo DB, Memecached
Message Queue	RabbitMQ, Amazon SQS
NoSQL DB	MongoDB
SQL DB	MYSQL DB
Proxy /Load Balancing	HAProxy, Nginx
Cross Language Service Development	Thrift
Centralized Service For Distributed Systems	Apache Zookeeper
Distributed RESTful Search Engine	Elasticsearch
Streaming	Kafka
Distributed File System	Hadoop
CDN(Content Delivery Network)	Openconnect(Used By Netflix), AWS, Clodflare, Google Cloud CDN
Key, Value Store	Amazon Daynamo DB, Redis, Zookeeper
Column Based DB's	Cassandra, HBase, Sylla
Image/Video Storage, Large Datasets, Time Series Database	Amazon S3, GCP Bucket
Distributed Consensus/ Leader Election	Zookeeper, Etcd
Real Time Communication(Audio, Video)	WebRTC
Kraken (Uber)	Peer To Peer Communication	Kraken, Uber Blog

Important Tools/Software/Methods/command for Software Development

Tool Name	Use Case	Type
Postman	API development lifecycle	Software
Dbeaver	SQL client for all types of relational DBs	Software
Snakeviz	viewer for Python profiling data	Web-Based package tool
OpenAPI	Standards to Define Restful web services	Standard Documentation
Swagger, MkDocs	Software Documentation, API Documentation	Tool
Coala	linting, Coding Standards, fixing code	pacakge
netcat	all ports listing	linux command
ps	list all running processes	linux command
Git	version control system	cli tool
Github, Gitlab, Bitbucket	Remote collaboration dev	web based tool
tracert	Server Router Hops Path Routing	linux command
Developer Tools Browser	API Analysis	browser built in tools
SSH	secure server protocol to communicate over unsecured server	cli tool
Vim Editor	developer CLI editor	cli tool
Pre-commit-hooks	linting, testing	git cli tool
Junit, pytest, jest	unit testing (java, python, javascript	packages
maven, gradle	project management tool
logging
Port Forwarding	ngrok

Measure Performance of a System

Latency
Throughput
Availability
Consistency
Maintainability
Reliability
Bandwidth

Most used Software Architectures

Layered Architecture
Pipe and Filter
Client Server
Model View Controller
Event Driven Architecture
Microservices Architecture

Article Links

The complete guide to crack the System Design interview
Scalable Web Architecture and Distributed Systems
System Design Interview Questions – Concepts You Should Know
25 Software Design Interview Questions to Crack Any Programming and Technical Interviews
Top 10 System Design Interview Questions and Answers
Getting Started With System Design
How To Crack System Design Round in Interviews
Internet And Protocols IP TCP HTTP
Introduction to architecting systems for scale
Consistent Hashing
Uber System Design
https://www.geeksforgeeks.org/what-is-web-socket-and-how-it-is-different-from-the-http/
https://www.tutorialspoint.com/websockets/index.htm
https://javascript.info/long-polling
https://www.pubnub.com/blog/http-long-polling/
https://hackernoon.com/understand-service-discovery-in-microservice-c323e78f47fd
https://dzone.com/articles/service-discovery-in-a-microservices-architecture

Best Engineering Blogs or Platform To Follow For System Design

HighScalability
The Architecture of Open Source Applications
A Distributed Systems Reading List
Interviewbit System Design
Linkedin Engineering
Hike Blogs

Best Documentations For System Design Or Cheatsheets Or Handful Practicals

Locate Cache in Your Browser
System design primer in Github
AWS Documentation
System Design Cheatsheet
System Design Prepration
Grokking System Design Interview Github
High Availability and Nines Chart
AWS vs Azure vs GCP
Basic Project Ideas For Distributed Systems
Crio Byte System Design Repo
System Design Cheat sheet

Note Content is under continuous development phase. it contains some terminologies and concepts which really helps when you are thinking about systems.

Content

Long Polling
Websockets
Service Discovery
Heartbeats
SQL vs NoSQL

Long Polling

Web app developers can implement a technique called HTTP long polling, where the client polls the server requesting new information. The server holds the request open until new data is available. Once available, the server responds and sends the new information. When the client receives the new information, it immediately sends another request, and the operation is repeated. This effectively emulates a server push feature. The flow:

A request is sent to the server.
The server doesn’t close the connection until it has a message to send.
When a message appears – the server responds to the request with it.
The browser makes a new request immediately.

WebSockets

Web sockets are defined as a two-way communication between the servers and the clients, which mean both the parties, communicate and exchange data at the same time. This protocol defines a full duplex communication from the ground up. Web sockets take a step forward in bringing desktop rich functionalities to the web browsers. It represents an evolution, which was awaited for a long time in client/server web technology.

WebSocket Uses

Real-time web application
Gaming application
Chat application

Gateway

A gateway is a piece of networking hardware used in telecommunications for telecommunications networks that allows data to flow from one discrete network to another. Gateways are distinct from routers or switches in that they communicate using more than one protocol to connect a bunch of networks and can operate at any of the seven layers of the open systems interconnection model (OSI).

A gateway is a network node that connects two networks using different protocols together. While a bridge is used to join two similar types of networks, a gateway is used to join two dissimilar networks.

Gateways can take several forms and perform a variety of tasks:

Web application firewall - filters traffic to and from a web server and look at application-layer data
Cloud gateway - provides basic protocol translation and simple connectivity to allow the incompatible technologies to communicate transparently
API, SOA or XML gateway - manages traffic flowing into and out of a service, microservices-oriented architecture or an XML-based web service
IoT gateway - aggregates sensor data, translates between sensor protocols, processes sensor data before sending it onward and more
Cloud storage gateway - translates storage requests with various cloud storage service API calls
Media gateway - converts data from the format required for one type of network to the format required for another
Amazon API Gateway - allows a developer to connect non-AWS applications to AWS back-end resources
VoIP trunk gateway - facilitates the use of plain old telephone service (POTS) equipment, such as landline phones and fax machines, with a voice over IP (VoIP) network
Email security gateway - prevents the transmission of emails that break company policy or will transfer information with malicious intent

Service Discovery

Service discovery is the automatic detection of devices and services offered by these devices on a computer network. A service discovery protocol (SDP) is a network protocol that helps accomplish service discovery. Service discovery aims to reduce the configuration efforts from users.

Service discovery requires a common language to allow software agents to make use of one another's services without the need for continuous user intervention.

Service discovery involves 3 parties: service provider, service consumer and service registry.

service provider registers itself with service registry when it enters and deregister itself when it leaves the system.
service consumer gets the location of a provider from registry, and then talks to the provider.
service registry maintains the latest location of providers.

complexities to handle:

The service may not deregister itself when it’s gone. Then the registry provides an invalid address to the consumer. To tackle this problem, service provider needs to send its heartbeat periodically (every 5 second maybe). If the provider hasn’t send any heartbeat for sometime, the registry will assume the death of provider, and deregister it.
Querying registry before calling provider every time? It’s place too much load on registry and impose unnecessary performance impact. It’s better to keep a copy in consumer itself.
If kept in consumer, how to notify consumer about the changes in provider? There are 2 ways to do it. 1) consumer use polling to get latest version. Since the locations usually don’t change so frequently, this still works. The drawback is the possible downtime between polling. 2) pubsub pattern. It provides more immediate update of locations, but it will hold up additional thread of consumer.
Sending back all data of a provider may not be necessary. We can keep a global versioning of providers and consumer only needs to update its local copy when version got incremented.
Single point of failure. If the service registry (e.g. the redis instance we are using here) is down, all consumer and provider will be affected. To alleviate this, we can use a distributed database as service registry, such as zookeeper/etcd/consul .

SQL vs NoSQL

When to pick a SQL database?

If you are writing a stock trading, banking, or a Finance-based app or you need to store a lot of relationships, for instance, when writing a social networking app like Facebook, then you should pick a relational database. SQL queries data in disk that's why SQL is more favourable then other DBs. Here’s why:

Transactions & Data Consistency

If you are writing software that has anything to do with money or numbers, that makes transactions, ACID, data consistency super important to you. Relational DBs shine when it comes to transactions & data consistency. They comply with the ACID rule, have been around for ages & are battle-tested.
Storing Relationships

If your data has a lot of relationships like which friends of yours live in a particular city? Which of your friend already ate at the restaurant you plan to visit today? etc. There is nothing better than a relational database for storing this kind of data.

Relational databases are built to store relationships. They have been tried & tested & are used by big guns in the industry like Facebook as the main user-facing database.

Popular relational databases:

MySQL
Microsoft SQL Server
PostgreSQL
MariaDB

When to pick a NoSQL database

Here are a few reasons why you want to pick a NoSQL database:

Handling A Large Number Of Read Write Operations

Look towards NoSQL databases when you need to scale fast. For example, when there are a large number of read-write operations on your website and when dealing with a large amount of data, NoSQL databases fit best in these scenarios. Since they have the ability to add nodes on the fly, they can handle more concurrent traffic and large amounts of data with minimal latency.
Running data analytics

NoSQL databases also fit best for data analytics use cases, where we have to deal with an influx of massive amounts of data.

Types Of NoSQL DBs:

Key-Value Store (Cache Implementation, Hashmap Architecture)
Document Based DBs (Schema Undefined, R/W Heavy)
Column Based DBs (Events Store, Read Heavy)
Search Based DBs (Image/Video Store, Time Series Database)

Popular NoSQL databases:

MongoDB
Redis
Cassandra
HBASE

Latency

Data Travelling speed from one place to another Place

Data Size	Medium	Constraint
1 MB	Memory	250 µs
1MB	SSD	1000 µs
1MB	1GBps Network	10000 µs
1MB	HDD	20000 µs
packet	Country To Country	150000 µs

Latency Comparison Numbers (~2012)

----------------------------------

L1 cache reference ......................... 0.5 ns
Branch mispredict ............................ 5 ns
L2 cache reference ........................... 7 ns
Mutex lock/unlock ........................... 25 ns
Main memory reference ...................... 100 ns             
Compress 1K bytes with Zippy ............. 3,000 ns  =   3 µs
Send 2K bytes over 1 Gbps network ....... 20,000 ns  =  20 µs
SSD random read ........................ 150,000 ns  = 150 µs
Read 1 MB sequentially from memory ..... 250,000 ns  = 250 µs
Round trip within same datacenter ...... 500,000 ns  = 0.5 ms
Read 1 MB sequentially from SSD* ..... 1,000,000 ns  =   1 ms
Disk seek ........................... 10,000,000 ns  =  10 ms
Read 1 MB sequentially from disk .... 20,000,000 ns  =  20 ms
Send packet CA->Netherlands->CA .... 150,000,000 ns  = 150 ms

Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

Credit
------
By Jeff Dean: http://research.google.com/people/jeff/
Originally by Peter Norvig: http://norvig.com/21-days.html#answers

Contributions
-------------
'Humanized' comparison: https://gist.github.com/hellerbarde/2843375
Visual comparison chart: http://i.imgur.com/k0t1e.png

Throughput

Performance of System Component In Given amount of time Example:

Number of API calls served per unit Time

Must Read for Latency vs Throughput vs Bandwidth

Throughput of system depends on Bandwidth
Throughput of system depends on Latency

Throughput ∝ Bandwidth Throughput ∝ Latency

Cache

Cache is used to access data at the earliest convenience to reduce network calls and recomputation.

Reducing Repetitive Network Calls
Avoiding Recomputation

Must Read

Load Balancing

in progress Load Balancing With Practical Implementation

Consistent Hashing

in progress Consistent Hashing

Database Index

in progress Must Read

Bloom Filters

in progress

Applications

Avoid One Hit Wonders
Check Username Available or Not
Weak password detection.
Internet Cache Protocol.
Safe browsing in Google Chrome.
Wallet synchronization in Bitcoin.
Hash based IP Traceback.
Cyber security like virus scanning.

Sharding

Types Of Sharding

Vertical Sharding : Separating large dbs in columns into multiple smaller subset dbs known as shards.
Horizontal Sharding: Separating large dbs in with same schema but unique rows into multiple smaller dbs known as shards.

Horizontal sharding is effective when queries tend to return a subset of rows that are often grouped together. For example, queries that filter data based on short date ranges are ideal for horizontal sharding since the date range will necessarily limit querying to only a subset of the servers.

Vertical sharding is effective when queries tend to return only a subset of columns of the data. For example, if some queries request only names, and others request only addresses, then the names and addresses can be sharded onto separate servers.

Sharding vs Partitioning

Very Good Read On Sharding Strategies

Sharding Startergy	Database	Pros	Cons
Consistent Hash Sharding	DaynamoDB Cassandra	Ideal For Scalabale Workloads, Adding Removing Nodes Easy	Performing Range Queries Are inefficient
Simple Hashing Modulo based Sharding (Algorithmic Sharding	Memecahced, Redis	No Need Of DB Load Balancer	Inefficient For Adding Removing Shard Nodes
Range Based Sharding	Google Spanner, HBase	Efficient For Range Queries	Database Warming, Hotspots

Leader Election

in progress Must Read

Split Brain Problem

in progress

Distributed Consensus

in progress

Important Terms

PAXOS
RAFT
Zookeeper
Etcd
2PC
3PC
MVCC (Multi Version Concurrency Control)

API (Application Programming Interface)

Advantages

Communication
Abstraction
Platform Agnostic

Examples

Private APIs
Public APIs
Web APIs
SDK/Library APIs

Standards

RPC
SOAP
REST

Some API Strategies For Different Purpose

API Security (OAuth2)
Rate Limiting
Throttling
Session Login

REST

REST stands for Representational State Transfer it is a lifecycle or pattern on which client and server both agrees to exchange data between them

Rest Guidelines

Client-Server Architecture
Cacheable
Layered
Stateless
Uniform Interface
Code on Demand

Format

protocol/domain_name/resource_path

most used protocol: HTTP, HTTPS

Types Of HTTP Method

GET
POST
DELETE
PUT/update

Example Of Path Parameter and Query parameter

Path Parameter https://domain.io/resource/path/:ID_or_KEY

Query Parameter https://domain.io/resource/path/?range_or_condition_isEqual_to

HTTP Status Codes

Status Code Format	Summary
2XX	Request Response Successful
4XX	Request Failed Because Of Client Mistake
5XX	Request Failed Because Of Server Mistake
3XX	Redirection Of Resource
1XX	Informational

Message Queues

A message broker service which provides asynchronous communication between producer and consumer services.

Example:

Dominos Pizza Serivce Queue
Ticket Window Queues

Note Suggestions are Welcome

API Gateway vs Service Mesh

A service mesh’s primary purpose is to manage internal service-to-service communication, while an API Gateway is primarily meant for external client-to-service communication.

Popular apps: Zuul, Amazon API Gateway.

Authentication and Authorization

S. No.	Authentication	Authorization
1	In authentication process, the identity of users are checked for providing the access to the system.	While in authorization process, person’s or user’s authorities are checked for accessing the resources.
2	In authentication process, users or persons are verified.	While in this process, users or persons are validated.
3	It is done before the authorization process.	While this process is done after the authentication process.
4	It needs usually user’s login details.	While it needs user’s privilege or security levels.
5	Authentication determines whether the person is user or not.	While it determines What permission do user have?

Credit - GFG Post

Files

README.md

Latest commit

History

README.md

File metadata and controls

System Design

Must Learn System Design Concepts

Important System Design Algorithms To Read Before Interview

Bits And Bytes For Any System

System Design Terminologies

Best System Design Interview Format Or Flow

Some Capacity Estimation

Availability Industry Measurements Cheatsheet

Concept Technology Implementation

Important Tools/Software/Methods/command for Software Development

Measure Performance of a System

Most used Software Architectures

Article Links

Best Engineering Blogs or Platform To Follow For System Design

Best Documentations For System Design Or Cheatsheets Or Handful Practicals

Best Youtube Channels And Playlists For System Design

Content

Long Polling

WebSockets

WebSocket Uses

Gateway

Service Discovery

SQL vs NoSQL

Latency

Throughput

Cache

Load Balancing

Consistent Hashing

Database Index

Bloom Filters

Sharding

Leader Election

Split Brain Problem

Distributed Consensus

API (Application Programming Interface)

REST

Message Queues

API Gateway vs Service Mesh

Authentication and Authorization