Table of Contents
References
- Book "Designing Data-Intensive Applications"
- ZH Ver. :《 数据密集型应用系统设计 》
Thinking About Data Systems
Reliability ( 可靠性 )
- Hardware Faults
- Software Errors
- Human Errors
- How Important Is Reliability?
Scalability ( 可伸缩性 )
- Describing Load
- Describing Performance
- Approaches for Coping with Load
Maintainability ( 可维护性 )
- Operability: Making Life Easy for Operations ( 可运维性 )
- Simplicity: Managing Complexity ( 简单性 )
- Evolvability: Making Change Easy ( 可演化性 )
Relational Model Versus Document Model ( 关系模型 / 文档模型 )
- The Birth of NoSQL
- The Object-Relational Mismatch
- Many-to-One and Many-to-Many Relationships
- Are Document Databases Repeating History?
- Relational Versus Document Databases Today
Query Languages for Data
- Declarative Queries on the Web ( 声明式查询 )
- MapReduce Querying
Graph-Like Data Models ( 图模型 )
- Property Graphs
- The Cypher Query Language
- Graph Queries in SQL
- Triple-Stores and SPARQL
- The Foundation: Datalog
Data Structures That Power Your Database
- Hash Indexes
- SSTables and LSM-Trees ( 日志结构合并树 )
- B-Trees
- Comparing B-Trees and LSM-Trees
- Other Indexing Structures
Transaction Processing or Analytics?
- Data Warehousing ( 数据仓库 )
- Stars and Snowflakes: Schemas for Analytics Column-Oriented Storage
Column-Oriented Storage ( 列式存储 )
- Column Compression
- Sort Order in Column Storage
- Writing to Column-Oriented Storage
- Aggregation: Data Cubes and Materialized Views
Formats for Encoding Data
- Language-Specific Formats
- JSON, XML, and Binary Variants
- Thrift and Protocol Buffers
- Avro
- The Merits of Schemas
Modes of Dataflow
- Dataflow Through Databases
- Dataflow Through Services: REST and RPC
- Message-Passing Dataflow
Leaders and Followers
- Synchronous Versus Asynchronous Replication
- Setting Up New Followers
- Handling Node Outages
- Implementation of Replication Logs
Problems with Replication Lag
- Reading Your Own Writes
- Monotonic Reads
- Consistent Prefix Reads
Solutions for Replication Lag
- Multi-Leader Replication
- Use Cases for Multi-Leader Replication
- Handling Write Conflicts
Multi-Leader Replication Topologies
- Leaderless Replication
- Writing to the Database When a Node Is Down
- Limitations of Quorum Consistency
- Sloppy Quorums and Hinted Handoff
- Detecting Concurrent Writes
Partitioning and Replication
Partitioning of Key-Value Data
- Partitioning by Key Range
- Partitioning by Hash of Key
- Skewed Workloads and Relieving Hot Spots
Partitioning and Secondary Indexes
- Partitioning Secondary Indexes by Document
- Partitioning Secondary Indexes by Term
Rebalancing Partitions
- Strategies for Rebalancing
- Operations: Automatic or Manual Rebalancing
Request Routing
- Parallel Query Execution
The Slippery Concept of a Transaction
- The Meaning of ACID
- Single-Object and Multi-Object Operations
Weak Isolation Levels
- Read Committed
- Snapshot Isolation and Repeatable Read
- Preventing Lost Updates
- Write Skew and Phantoms
Serializability
- Actual Serial Execution
- Two-Phase Locking (2PL)
- Serializable Snapshot Isolation (SSI)
Faults and Partial Failures
- Cloud Computing and Supercomputing
Unreliable Networks
- Network Faults in Practice
- Detecting Faults
- Timeouts and Unbounded Delays
- Synchronous Versus Asynchronous Networks
Unreliable Clocks
- Monotonic Versus Time-of-Day Clocks
- Clock Synchronization and Accuracy
- Relying on Synchronized Clocks
- Process Pauses
Knowledge, Truth, and Lies
- The Truth Is Defined by the Majority
- Byzantine Faults
- System Model and Reality
Consistency Guarantees
Linearizability
- What Makes a System Linearizable?
- Relying on Linearizability
- Implementing Linearizable Systems
- The Cost of Linearizability
Ordering Guarantees
- Ordering and Causality
- Sequence Number Ordering
- Total Order Broadcast
Distributed Transactions and Consensus
- Atomic Commit and Two-Phase Commit (2PC)
- Distributed Transactions in Practice
- Fault-Tolerant Consensus
- Membership and Coordination Services
Batch Processing with Unix Tools
- Simple Log Analysis
- The Unix Philosophy
MapReduce and Distributed Filesystems
- MapReduce Job Execution
- Reduce-Side Joins and Grouping
- Map-Side Joins
- The Output of Batch Workflows
- Comparing Hadoop to Distributed Databases
Beyond MapReduce
- Materialization of Intermediate State
- Graphs and Iterative Processing
- High-Level APIs and Languages
Transmitting Event Streams
- Messaging Systems
- Partitioned Logs
Databases and Streams
- Keeping Systems in Sync
- Change Data Capture
- Event Sourcing
- State, Streams, and Immutability
Processing Streams
- Uses of Stream Processing
- Reasoning About Time
- Stream Joins
- Fault Tolerance
Data Integration
- Combining Specialized Tools by Deriving Data
- Batch and Stream Processing
Unbundling Databases
- Composing Data Storage Technologies
- Designing Applications Around Dataflow
- Observing Derived State
Aiming for Correctness
- The End-to-End Argument for Databases
- Enforcing Constraints
- Timeliness and Integrity
- Trust, but Verify
Doing the Right Thing
- Predictive Analytics
- Privacy and Tracking 536 Summary
- Glossary
- 读者跋