Database Techniques

Replication

	Master-slave	Master-master
Architecture
Alias	Single-leader	Multi-leader
Write operations	Master only	Masters
Read operations	Master and slaves	Masters
Failure handling	If the master is down, one of the slave will be promoted to be a new master.	If one master is down, other masters can still handle reads and writes.

Types
- Horizontal partitioning (Sharding): Split data by row.
- Vertical partitioning: Split data by column.
- Functional partitioning (Federation): Split data into multiple databases by business boundary.
Pros
- Less read and write traffic to a single database/shard.
- If one database/shard is down, other databases/shards will be still operational.
- More writes in parallel to increase throughput.

Cons and solutions

Cons	Solution
Join operation is complex.	Denormalize the database so that queries can be performed in a single table (See Denormalization)
Rebalance is complex.
Introduce hot spots (celebrity problem).	Allocate a partition for each celebrity. Each partition might even require further partition.

When to use
- When to split data into multiple databases (分库)
  - When the concurrency is high (because number of database connections is limited).
  - When the data needs to be archived (create a new database for storing archived data).
- When to split data into multiple tables (分表)
  - When the data size is huge.

Concept
- Use the mod of key to decide which partition to store the data.
Example
- If we want to separate data into 128 tables, we just get the mod of the key. If the mod of a key is 2, so store the record into the table_002.

Concept
- Assign a range of keys to each partition.
Pros
- Range queries are easy.
Cons
- Certain access patterns can lead to hot spots (A partition with disproportionately high load).

Concept
- Consistent hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring.
Pros
- When adding or removing servers, the number of keys that need to be relocated is minimized.

Concepts
- Place a lookup service in front of the sharded databases.
  - The lookup service knows the current partitioning scheme.
  - The lookup service keeps a map of each entity and which database shard it is stored on.
- When we need to query the database, query the lookup service first to figure out which shard has the data we want.
Pros
- Loose coupling: Any partitioning scheme changes will encapsulated and will not impact on the application.

Query on the sharding key
- Use the sharding key to locate the certain shard (table) easily.
Query on the non-sharding columns
- Solution 1: Add the non-sharding column as part of the sharding key.
  - Example: For the order table, the sharding key is orderID. But if we frequently query the order table by userID, we could add userID as part of the sharding key. So the final sharding key will be orderID + userID.
- Solution 2: Double-write (write the same data into 2 different shards): One shard can be queried by some non-sharding columns, another shard can be queried by some other non-sharding columns.
  - Example: For the order table, the sharding key of one shard is "orderID + userID" (this can be queried by orderID and userID), the sharding key of another shard is "sellerID" (this can be queried by sellerID)
- Solution 3: No sharding, create a better index with better hardward resources.