- Types
- Horizontal partitioning (Sharding): Split data by row.
- Vertical partitioning: Split data by column.
- Functional partitioning (Federation): Split data into multiple databases by business boundary.
- Pros
- Less read and write traffic to a single database/shard.
- If one database/shard is down, other databases/shards will be still operational.
- More writes in parallel to increase throughput.
- Cons and solutions
Cons Solution Join operation is complex. Denormalize the database so that queries can be performed in a single table (See Denormalization) Rebalance is complex. Introduce hot spots (celebrity problem). Allocate a partition for each celebrity. Each partition might even require further partition. - When to use
- When to split data into multiple databases (分库)
- When the concurrency is high (because number of database connections is limited).
- When the data needs to be archived (create a new database for storing archived data).
- When to split data into multiple tables (分表)
- When the data size is huge.
- When to split data into multiple databases (分库)
- Concept
- Use the mod of key to decide which partition to store the data.
- Example
- If we want to separate data into 128 tables, we just get the mod of the key. If the mod of a key is 2, so store the record into the
table_002
.
- If we want to separate data into 128 tables, we just get the mod of the key. If the mod of a key is 2, so store the record into the
- Concept
- Assign a range of keys to each partition.
- Pros
- Range queries are easy.
- Cons
- Certain access patterns can lead to hot spots (A partition with disproportionately high load).
- Concept
- Assign a range of hashes to each partition.
- Pros
- Keys distributing is fair among the partitions.
- Cons
- Lose the ability to do efficient range queries.
- Concept
- Consistent hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring.
- Pros
- When adding or removing servers, the number of keys that need to be relocated is minimized.
- Concepts
- Place a lookup service in front of the sharded databases.
- The lookup service knows the current partitioning scheme.
- The lookup service keeps a map of each entity and which database shard it is stored on.
- When we need to query the database, query the lookup service first to figure out which shard has the data we want.
- Place a lookup service in front of the sharded databases.
- Pros
- Loose coupling: Any partitioning scheme changes will encapsulated and will not impact on the application.
- Query on the sharding key
- Use the sharding key to locate the certain shard (table) easily.
- Query on the non-sharding columns
- Solution 1: Add the non-sharding column as part of the sharding key.
- Example: For the
order
table, the sharding key isorderID
. But if we frequently query theorder
table byuserID
, we could adduserID
as part of the sharding key. So the final sharding key will beorderID + userID
.
- Example: For the
- Solution 2: Double-write (write the same data into 2 different shards): One shard can be queried by some non-sharding columns, another shard can be queried by some other non-sharding columns.
- Example: For the
order
table, the sharding key of one shard is "orderID + userID" (this can be queried byorderID
anduserID
), the sharding key of another shard is "sellerID" (this can be queried bysellerID
)
- Example: For the
- Solution 3: No sharding, create a better index with better hardward resources.
- Solution 1: Add the non-sharding column as part of the sharding key.