|Metrics for availability & performance. SLAs A brief review of the two important theorems related to distributed systems–FLP and CAP Theorem and what they mean to real systems.
Linearizability, Consistency, Serializability-‐Types of consistency: Weak/Eventual/Strong/Causal/FIFO consistency. Locks Protocols vs Leases. Problems of transactional systems. Highly available transactions between data centers. Time Uses of real and logical time for conflict resolution and maintaining causal histories.
Time:real and vector clocks, version vectors. Facebook’s Cassandra database. Flexible consistency. Google
TrueTime. Real clock time using atomic clocks & interval arithmetic.
Fault Tolerance Patterns-‐A taxonomy of
patterns (Architectural, Failure Detection, Error Recovery Error Mitigation and Fault Treatment.
Study scenarios that are mused at scale. Taxonomy, differences between database replication and distributed system replication. Transactional replication vs. state machine replication.Primary/backup..Sync/Async and atomicity guarantees. Case studies: Riak, Microsoft Azure, Amazon Simple DB, Google Data store,Sinfonia.
Preventing data divergence Consensus algorithms and state machine replication. Basic Paxos, issues, variety of implementations, Raft consensus algorithm: a vast improvement on Paxos. Chain replication.
Dynamic membership changes What defines a cluster, and how are the services and data migrated to new machines. This section is about managed membership changes.
Distributed Coordination A look at how Yahoo and Google coordinate services using low level services. Case study: Zookeeper. Used by Yahoo, it provides services like publish/subscribe, locks, hierarchical naming and events on state changes.
Storage The challenges and economies at scale of magnetic disks, SSDs and in-‐memory systems. Latency, failure rates, MTBFs and power analyses.
Monitoring -‐ Tracing and logging at scale
|Distributed Systems: Concepts & Design, 5th Edition by George Coulouris, Jean Dollimore, Tim Kindberg, Gordon Blair; Publisher: Addison – Wesley; ISBN: 978-0132143011.
Distributed Systems: An Algorithmic Approach by Sukumar Ghosh; Publisher: Chapman and Hall/CRC; ISBN: 978-1584885641.
Introduction to Reliable and Secure Distributed Programming by Christian Cachin, Rachid Guerraoui, Luis Rodrigues; Publisher: Springer (February 23, 2011); ISBN: 978-3642152610.