Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Syst

PrefaceWho Should Read This Book?Scope of This BookOutline of This BookReferences and Further ReadingFoundations of Data SystemsReliable, Scalable, and Maintainable ApplicationsThinking About Data SystemsReliabilityHardware FaultsSoftware ErrorsHuman ErrorsHow Important Is Reliability?ScalabilityDescribing LoadDescribing PerformanceApproaches for Coping with LoadMaintainabilityOperability: Making Life Easy for OperationsSimplicity: Managing ComplexityEvolvability: Making Change EasySummaryReferencesData Models and Query LanguagesRelational Model Versus Document ModelThe Birth of NoSQLThe Object-Relational MismatchMany-to-One and Many-to-Many RelationshipsOrganizations and schools as entitiesRecommendationsAre Document Databases Repeating History?The network modelThe relational modelComparison to document databasesRelational Versus Document Databases TodayWhich data model leads to simpler application code?Schema flexibility in the document modelData locality for queriesConvergence of document and relational databasesQuery Languages for DataDeclarative Queries on the WebMapReduce QueryingGraph-Like Data ModelsProperty GraphsThe Cypher Query LanguageGraph Queries in SQLTriple-Stores and SPARQLThe semantic webThe RDF data modelThe SPARQL query languageGraph Databases Compared to the Network ModelThe Foundation: DatalogSummaryReferencesStorage and RetrievalData Structures That Power Your DatabaseHash IndexesSSTables and LSM-TreesConstructing and maintaining SSTablesMaking an LSM-tree out of SSTablesPerformance optimizationsB-TreesB-tree optimizationsComparing B-Trees and LSM-TreesDownsides of LSM-treesOther Indexing StructuresStoring values within the indexMulti-column indexesFull-text search and fuzzy indexesKeeping everything in memoryTransaction Processing or Analytics?Data WarehousingStars and Snowflakes: Schemas for AnalyticsColumn-Oriented StorageColumn CompressionColumn-oriented storage and column familiesMemory bandwidth and vectorized processingSort Order in Column StorageSeveral different sort ordersWriting to Column-Oriented StorageAggregation: Data Cubes and Materialized ViewsSummaryReferencesEncoding and EvolutionFormats for Encoding DataTerminology clashLanguage-Specific FormatsJSON, XML, and Binary VariantsBinary encodingThrift and Protocol BuffersField tags and schema evolutionDatatypes and schema evolutionAvroThe writer's schema and the reader's schemaSchema evolution rulesBut what is the writer's schema?Dynamically generated schemasCode generation and dynamically typed languagesThe Merits of SchemasModes of DataflowDataflow Through DatabasesDifferent values written at different timesArchival storageDataflow Through Services: REST and RPCWeb servicesCurrent directions for RPCData encoding and evolution for RPCMessage-Passing DataflowMessage brokersDistributed actor frameworksSummaryReferencesDistributed DataReplicationLeaders and FollowersSynchronous Versus Asynchronous ReplicationResearch on ReplicationSetting Up New FollowersHandling Node OutagesFollower failure: Catch-up recoveryLeader failure: FailoverImplementation of Replication LogsStatement-based replicationWrite-ahead log (WAL) shippingLogical (row-based) log replicationTrigger-based replicationProblems with Replication LagReading Your Own WritesMonotonic ReadsConsistent Prefix ReadsSolutions for Replication LagMulti-Leader ReplicationUse Cases for Multi-Leader ReplicationMulti-datacenter operationClients with offline operationCollaborative editingHandling Write ConflictsSynchronous versus asynchronous conflict detectionConflict avoidanceConverging toward a consistent stateCustom conflict resolution logicWhat is a conflict?Multi-Leader Replication TopologiesLeaderless ReplicationWriting to the Database When a Node Is DownRead repair and anti-entropyQuorums for reading and writingLimitations of Quorum ConsistencyMonitoring stalenessSloppy Quorums and Hinted HandoffMulti-datacenter operationDetecting Concurrent WritesLast write wins (discarding concurrent writes)The "happens-before" relationship and concurrencyConcurrency, Time, and RelativityCapturing the happens-before relationshipMerging concurrently written valuesVersion vectorsVersion vectors and vector clocksSummaryReferencesPartitioningPartitioning and ReplicationPartitioning of Key-Value DataPartitioning by Key RangePartitioning by Hash of KeySkewed Workloads and Relieving Hot SpotsPartitioning and Secondary IndexesPartitioning Secondary Indexes by DocumentPartitioning Secondary Indexes by TermRebalancing PartitionsStrategies for RebalancingHow not to do it: hash mod NFixed number of partitionsDynamic partitioningPartitioning proportionally to nodesOperations: Automatic or Manual RebalancingRequest RoutingParallel Query ExecutionSummaryReferencesTransactionsThe Slippery Concept of a TransactionThe Meaning of ACIDAtomicityConsistencyIsolationDurabilitySingle-Object and Multi-Object OperationsSingle-object writesThe need for multi-object transactionsHandling errors and abortsWeak Isolation LevelsRead CommittedNo dirty readsNo dirty writesImplementing read committedSnapshot Isolation and Repeatable ReadImplementing snapshot isolationVisibility rules for observing a consistent snapshotIndexes and snapshot isolationRepeatable read and naming confusionPreventing Lost UpdatesAtomic write operationsExplicit lockingAutomatically detecting lost updatesCompare-and-setConflict resolution and replicationWrite Skew and PhantomsCharacterizing write skewMore examples of write skewPhantoms causing write skewMaterializing conflictsSerializabilityActual Serial ExecutionEncapsulating transactions in stored proceduresPros and cons of stored proceduresPartitioningSummary of serial executionTwo-Phase Locking (2PL)Implementation of two-phase lockingPerformance of two-phase lockingPredicate locksIndex-range locksSerializable Snapshot Isolation (SSI)Pessimistic versus optimistic concurrency controlDecisions based on an outdated premiseDetecting stale MVCC readsDetecting writes that affect prior readsPerformance of serializable snapshot isolationSummaryReferencesThe Trouble with Distributed SystemsFaults and Partial FailuresCloud Computing and SupercomputingUnreliable NetworksNetwork Faults in PracticeNetwork partitionsDetecting FaultsTimeouts and Unbounded DelaysNetwork congestion and queueingSynchronous Versus Asynchronous NetworksCan we not simply make network delays predictable?Latency and Resource UtilizationUnreliable ClocksMonotonic Versus Time-of-Day ClocksTime-of-day clocksMonotonic clocksClock Synchronization and AccuracyRelying on Synchronized ClocksTimestamps for ordering eventsClock readings have a confidence intervalSynchronized clocks for global snapshotsProcess PausesResponse time guaranteesIs real-time really real?Limiting the impact of garbage collectionKnowledge, Truth, and LiesThe Truth Is Defined by the MajorityThe leader and the lockFencing tokensByzantine FaultsThe Byzantine Generals ProblemWeak forms of lyingSystem Model and RealityCorrectness of an algorithmSafety and livenessMapping system models to the real worldSummaryReferencesConsistency and ConsensusConsistency GuaranteesLinearizabilityWhat Makes a System Linearizable?Relying on LinearizabilityLocking and leader electionConstraints and uniqueness guaranteesCross-channel timing dependenciesImplementing Linearizable SystemsLinearizability and quorumsThe Cost of LinearizabilityThe CAP theoremLinearizability and network delaysOrdering GuaranteesOrdering and CausalityThe causal order is not a total orderLinearizability is stronger than causal consistencyCapturing causal dependenciesSequence Number OrderingNoncausal sequence number generatorsLamport timestampsTimestamp ordering is not sufficientTotal Order BroadcastScope of ordering guaranteeUsing total order broadcastImplementing total order broadcast using linearizable storageDistributed Transactions and ConsensusAtomic Commit and Two-Phase Commit (2PC)From single-node to distributed atomic commitIntroduction to two-phase commitDon't confuse 2PC and 2PLA system of promisesCoordinator failureThree-phase commitDistributed Transactions in PracticeExactly-once message processingXA transactionsHolding locks while in doubtRecovering from coordinator failureLimitations of distributed transactionsFault-Tolerant ConsensusConsensus algorithms and total order broadcastSingle-leader replication and consensusEpoch numbering and quorumsLimitations of consensusMembership and Coordination ServicesAllocating work to nodesService discoveryMembership servicesSummaryReferencesDerived DataBatch ProcessingBatch Processing with Unix ToolsSimple Log AnalysisChain of commands versus custom programSorting versus in-memory aggregationThe Unix PhilosophyA uniform interfaceSeparation of logic and wiringTransparency and experimentationMapReduce and Distributed FilesystemsMapReduce Job ExecutionDistributed execution of MapReduceMapReduce workflowsReduce-Side Joins and GroupingExample: analysis of user activity eventsSort-mergejoinsBringing related data together in the same placeGROUP BYHandling skewMap-Side JoinsBroadcast hash joinsPartitioned hash joinsMap-side merge joinsMapReduce workflows with map-side joinsThe Output of Batch WorkflowsBuilding search indexesKey-value stores as batch process outputPhilosophy of batch process outputsComparing Hadoop to Distributed DatabasesDiversity of storageDiversity of processing modelsDesigning for frequent faultsBeyond MapReduceMaterialization of Intermediate StateDataflow enginesFault toleranceDiscussion of materializationGraphs and Iterative ProcessingThe Pregel processing modelFault toleranceParallel executionHigh-Level APIs and LanguagesThe move toward declarative query languagesSpecialization for different domainsSummaryReferencesStream ProcessingTransmitting Event StreamsMessaging SystemsDirect messaging from producers to consumersMessage brokersMessage brokers compared to databasesMultiple consumersAcknowledgments and redeliveryPartitioned LogsUsing logs for message storageLogs compared to traditional messagingConsumer offsetsDisk space usageWhen consumers cannot keep up with producersReplaying old messagesDatabases and StreamsKeeping Systems in SyncChange Data CaptureImplementing change data captureInitial snapshotLog compactionAPI support for change streamsEvent SourcingDeriving current state from the event logCommands and eventsState, Streams, and ImmutabilityAdvantages of immutable eventsDeriving several views from the same event logConcurrency controlLimitations of immutabilityProcessing StreamsUses of Stream ProcessingComplex event processingStream analyticsMaintaining materialized viewsSearch on streamsMessage passing and RPCReasoning About TimeEvent time versus processing timeKnowing when you're readyWhose clock are you using, anyway?Types of windowsStream JoinsStream-stream join (window join)Stream-table join (stream enrichment)Table-table join (materialized view maintenance)Time-dependence of joinsFault ToleranceMicrobatching and checkpointingAtomic commit revisitedIdempotenceRebuilding state after a failureSummaryReferencesThe Future of Data SystemsData IntegrationCombining Specialized Tools by Deriving DataReasoning about dataflowsDerived data versus distributed transactionsThe limits of total orderingOrdering events to capture causalityBatch and Stream ProcessingMaintaining derived stateReprocessing data for application evolutionThe lambda architectureUnifying batch and stream processingUnbundling DatabasesComposing Data Storage TechnologiesCreating an indexThe meta-database of everythingMaking unbundling workUnbundled versus integrated systemsWhat's missing?Designing Applications Around DataflowApplication code as a derivation functionSeparation of application code and stateDataflow: Interplay between state changes and application codeStream processors and servicesObserving Derived StateMaterialized views and cachingStateful, offline-capable clientsPushing state changes to clientsEnd-to-end event streamsReads are events tooMulti-partition data processingAiming for CorrectnessThe End-to-End Argument for DatabasesExactly-once execution of an operationDuplicate suppressionOperation identifiersThe end-to-end argumentApplying end-to-end thinking in data systemsEnforcing ConstraintsUniqueness constraints require consensusUniqueness in log-based messagingMulti-partition request processingTimeliness and IntegrityCorrectness of dataflow systemsLoosely interpreted constraintsCoordination-avoiding data systemsTrust, but VerifyMaintaining integrity in the face of software bugsDon't just blindly trust what they promiseA culture of verificationDesigning for auditabilityThe end-to-end argument againTools for auditable data systemsDoing the Right ThingPredictive AnalyticsBias and discriminationResponsibility and accountabilityFeedback loopsPrivacy and TrackingSurveillanceConsent and freedom of choicePrivacy and use of dataData as assets and powerRemembering the Industrial RevolutionLegislation and self-regulationSummaryReferences
Next >