Distributed systems: overview. * Why do we need to distribute a system? Increase system capacity and performance (e.g., shard data across multiple replicas and access in parallel), increase fault tolerance (e.g., replicate data across multiple sites), improve response time (e.g., fetch from multiple replicas and return the fastest response), and so on. * We will first study systems in which functionality/data is replicated across multiple replicas for fault tolerance. We will later study systems where data is partitioned across multiple replicas for parallel performance gains. * Understand the terms: fault tolerance (ability to function correctly and not lose data in the face of link and node failures), and availability (fraction of time system is up and running, even in the case of failures). A common technique to provide fault tolerance and high availability is replication. * A common way to think about multiple replicas is as replicated state machines. That is, each replica represents the state machine of the system. Every input that the system receives is sent to all replicas, and each replica updates its state machine. If all replicas start at the same state and see the same inputs, then they will all stay at the same state. For example, if the system is a distributed key-value store, then the state machine represents the key-value store, and get/put operations are the inputs. If all replicas see the same get/put operations, then the key-value store will be the same across all. What is the alternative to replicated state machines? Periodically transfer all state of the system (not just inputs) to all replicas. For example, transfer the entire key-value database. This can get cumbersome. So the replicated state machine abstraction is the most common way in which fault tolerant systems are built. * What is the challenge in replication? Need to ensure consistency across replicas. Understand the high level definition of consistency: if a write a variable in one replica (or put a key-value pair), and read from another replica (or get the key), I should see the latest value written. Or, if I get a key from two different replicas, I should get the same value, irrespective of which replica I go to. * There are different models of consistency as described below, each successive one more relaxed than the previous one. In all examples below, assume multiple clients are going read/get or write/put operations to a distributed system. * Strict consistency: the distributed system behaves exactly as a single system. That is, every write is instantaneously visible across all replicas. Every get returns the latest put value without any time lag. In general, this is hard to achieve, because there is a non-zero communication delay across the network, and it takes time to propagate information to all replicas. * Atomic consistency or linearizability: all operations are executed in the same order across all replicas. Further, if an operation Y was started after operation X completed (according to some global clock), then operation Y should see the result of operation X. For example: Client 0: x = 0 Client 1: x = 1 x = 2 Client 2: read x Client 2 reads x after client 1 finishes writing x=1 but before it finishes the operation x=2. Then, when client 2 reads x, a linearizable system must return the value 1. Becuase the write x=2 is happening concurrently, its value need not be reflected in client 2's read. * Sequential consistency: all operations are executed in the same order across all replicas (does not have the extra constraint of linearizability). In the previous example, it is ok to have executed the order x=1,x=0 at all replicas, and return the value 0 when client 2 reads x. Why does this happen? Maybe the write from client 0 reached after the write from client 1. As long as all replicas execute instructions in the same order, it is considered alright in a sequentially consistent system. * The above three models are considered strong consistency models. There are other weaker consistency models. * Causal consistency: if two operations X and Y are causally connected, i.e., if X causes Y, then X will appear before Y. Other operations can appear in different orders at different replicas. For example, if a client reads a variable and then does a write, then this read-write will appear in the same order across replicas. Other operations from other clients can come in any order. * Eventual consistency: this is a very weak model, that says that a read will reflect the value of a write eventually, after an arbitrary delay. * What about shared memory in a single system? A single core machine without any reordering of instructions can provide atomic consistency. Modern multicore systems with reordering optimizations may provide weaker consistency guarantees. * In general, implementing a stronger consistency model means more work to be done by the system. A distributed system providing a strong consistency guarantee will return a response back to the client only after it is sure that replication has been performed in a manner that consistency can be guaranteed. What if there are network or node failures and the system cannot replicate properly? * For example, suppose the distributed system has two replicas. Each replica receives get/put requests from clients. Each replica performs the get/put at the other one also before it returns a response to the client, so that both replicas are always consistent with each other. Suppose the replicas have stored (k, v0) initially. At a later time, the network develops a partition and the replicas cannot reach each other. Now, client sends put(k,v1) to replica 1. But replica 1 cannot contact replica 2. Can replica 1 execute the put on its state machine and return response to client? What happens if client does a get from the other replica? The latest put won't be reflected and the system won't be consistent (replica 2 may return value v0). Therefore, in case of a network partition, replica 1 cannot (should not) return a response to the client. That is, the system becomes unavailable. However, if the system chooses to give up the consistency guarantee (replica 1 will convey the latest value to replica 2 at a later time, and until then, it is ok for replica 2 to return old/stale results), then replica 1 can reply to the client's request. * What is the above example telling us? It is impossible to achieve the below three properties together: strong consistency (atomic or sequential consistency), availability, and tolerance to network partitions. This is called the CAP theorem (pick any two of consistency, availability, partition tolerance, you can't have all three).