Presentation on theme: "Principles and Paradigms Consistency and Replication"— Presentation transcript:
1Principles and Paradigms Consistency and Replication Distributed SystemsPrinciples and ParadigmsChapter 06Consistency and Replication
2Consistency & Replication Introduction (what’s it all about)Data-centric consistency modelsClient-centric consistency modelsDistribution protocolsConsistency protocolsExamples
3What kind of things do we replicate in a distributed system? ReplicationWhat kind of things do we replicate in a distributed system?DataServersWhy do we replicate things?To increaseReliabilityPerformanceWhat is the main problem in providing replication?Keeping replicas consistent!
4Shared ObjectsProblem: If objects (or data) are shared, we need to do something about concurrent accesses to guarantee state consistency.
5Concurrency Control (1/2) Solution (a): the shared object itself can handle concurrent invocationsSolution (b): the system in which the object resides is responsible
6Concurrency Control (2/2) Problem: How do we manage replicated shared data objects?Solution (a): objects are replication-aware; object-specific replication protocol is used for replica managementSolution (b): the distributed system is responsible for replica management
7Performance and Scalability Main issue: To keep replicas consistent, we generally need to ensure that all conflicting operations are done in the the same order everywhereConflicting operations: From the world of transactions:Read–write conflict: a read operation and a write operation act concurrentlyWrite–write conflicts: two concurrent write operationsGuaranteeing global ordering on conflicting operations may be a costly operation, downgrading scalabilitySolution: to weaken consistency requirements so that hopefully global synchronization can be avoided
8Weakening Consistency Requirements What does it mean to “weaken consistency requirements”?Relax the requirement that updates need to be executed as atomic operationsDo not require global synchronizationsCopies may not always be the same everywhereTo what extent can consistency be weakened?Depends highly on the access and update patterns of the replicated dataDepends on the use of the replicated data (i.e., application)
9Data-Centric Consistency Models (1/2) Consistency model: a contract between a (distributed) data store and processes, in which the data store specifies precisely what the results of read and write operations are in the presence of concurrency.A data store is a distributed collection of storages accessible to clients:
10Data-Centric Consistency Models (2/2) Strong consistency models: Operations on shared data are synchronized (models not using synchronization operations):Strict consistency (related to absolute global time)Linearizability (atomicity)Sequential consistency (what we are used to - serializability)Causal consistency (maintains only causal relations)FIFO consistency (maintains only individual ordering)Weak consistency models: Synchronization occurs only when shared data is locked and unlocked (models with synchronization operations):General weak consistencyRelease consistencyEntry consistencyObservation: The weaker the consistency model, the easier it is to build a scalable solution.
11Strict Consistency (1/2) Any read to a shared data item X returns the value stored by the most recent write operation on X.Observation: It doesn’t make sense to talk about “the most recent” in a distributed environment.Assume all data items have been initialized to NILW(x)a: value a is written to xR(x)a: reading x returns the value aThe behavior shown in Figure (a) is correct for strict consistencyThe behavior shown in Figure (b) is incorrect for strict consistency
12Strict Consistency (2/2) Strict consistency is what you get in the normal sequential case, where your program does not interfere with any other program.When a data store is strictly consistent, all writes are instantaneously visible to all processes and an absolute global time order is maintainedIf a data item is changed, all subsequent reads performed on that data return the new value, no matter how soon after the change the reads are done, and no matter which processes are doing the reading and where they are locatedIf a read is done, it gets the current value, no matter how quickly the next write is doneUnfortunately, this is impossible to implemented in a distributed system
13Sequential Consistency (1/2) Sequential consistency is a slightly weaker consistency model than strict consistency. A data store is said to be sequentially consistent when it satisfies the following condition:The result of any execution is the same as if the (read and write) operations by all processes on the data store were executed in some sequential order, and the operations of each individual process appear in this sequence in the order specified by its program.When processes run concurrently on possibly different machines, any valid interleaving of read and write operations is acceptable behaviorAll processes see the same interleaving of executions.Nothing is said about timea process “sees” writes from all processes but only its own reads
14Sequential Consistency (2/2) Figure (a) – a sequentially consistent data storeP1 first performs W(x)a to x. Later in absolute time, P2 also performs W(x)b to xBoth P3 & P4 first read value b and later value a. The write operation of P2 appears to have taken place before that of P1 to both P3 & P4Figure (b) – a data store that is not sequentially consistentNot all processes see the same interleaving of write operations
15LinearizabilityA consistent model that is weaker than strict consistency, but stronger than sequential consistency is linearizability.Operations are assumed to receive a timestamp using a globally available clock, but one with only finite precision.A data store is said to be linearizable when each operation is timestamped and the following condition holds:The result of any execution is the same as if the (read and write) operations by all processes on the data store were executed in some sequential order, and the operations of each individual process appear in this sequence in the order specified by its program.In addition, if tsOP1(x) < tsOP2(y), then operation OP1(x) should precede OP2(y) in this sequence.a linearizable data store is also sequentially consistentLinearizability takes ordering according to a set of synchronized clocks
16Causal Consistency (1/2) The causal consistency model is a weaker model than sequential consistency.Makes a distinction between events that are potentially causally related and those that are not.If event B is caused or influenced by an earlier event, A, causality requires that everyone else first see A, then see B.Operations that are not causally related are said to be concurrent.A data store is said to be causally consistent, if it obeys the following condition:Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order by different processes.See Figure 6-9 as an example of a causally-consistent store
17Causal Consistency (2/2) Figure (a) – a data store that is not causally consistentTwo writes, W(x)a and W(x)b, are casually related since b may be a result of a computation involving R(x)aFigure (b) – a data store that is causally consistent
18FIFO Consistency FIFO consistency is weaker than causal consistency Removed the requirement that causally-related writes must be see in the same order by all processesA data store is said to be FIFO consistent when it satisfies the following condition:Writes done by a single process are received by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes.
19Weak Consistency (1/2)Although FIFO consistency can give better performance than the stronger consistency models, it is still unnecessarily restrictive for many applications because they require that writes originating in a single process be seen every where in orderNot all applications require seeing all writes or seeing them in orderSolution: Use a synchronization variable. Synchronize(S) synchronizes all local copies of the data storeUsing synchronization variables to partly define consistency is called weak consistency - has three properties:Accesses to synchronization variables are sequentially consistent.No access to a synchronization variable is allowed to be performed until all previous writes have completed everywhere.No data access is allowed to be performed until all previous accesses to synchronization variables have been performed.
20Weak Consistency (2/2)Figure (a) – a data store that is weak consistent (i.e., valid sequence)P1 performs W(x)a and W(x)b and then synchronizes. P2 and P3 have not yet been synchronized, thus no guarantees are given about what they seeFigure (b) – a data store that is not weak consistent - why not?Since P2 has synchronized, R(x) in P2 must read b
21Release Consistency (1/2) Weak consistency has the problem that when a synchronization variable is accessed, the data store does not know whether this is being done because the process is eitherFinished writing the shared data, orAbout to start reading dataConsequently, the data store must take the actions required in both casesMake sure that all locally initiated writes have been completed (i.e., propagated to other copies)Gathering in all writes from other copiesIf the data store could tell the difference between entering a critical region or leaving one, a more efficient implementation might be possible.
22Release Consistency (2/2) Idea: Divide access to a synchronization variable into two parts: an acquire and a release phase.About to start accessing data - Acquire forces a requester to wait until the shared data can be accessedFinished accessing the shared data - Release sends requester’s local value to other servers in data store.Question: Why did P3 get a instead of b when it executed R(x)? Since P3 does not do an acquire before reading x, the data store has no obligation to give it the current value of x, so returning a is ok.
23Entry Consistency (1/3)With release consistency, all local updates are propagated to other copies/servers during release of shared data.With entry consistency, each shared data item is associated with a synchronization variable.In order to access consistent data, each synchronization variable must be explicitly acquired.Release consistency affects all shared data but entry consistency affects only those shared data associated with a synchronization variable.
24Entry Consistency (2/3)A data store exhibits entry consistency if it meets all of the following conditionsAn acquire access of a synch variable is not allowed to perform with respect to a process until all updates to the guarded shared data have been performed with respect to that process.Before an exclusive mode acess to a synch variable by a process is allowed to perform with respect to that process, no other process may hold the synch variable, not even in nonexclusive mode.After an exclusive mode access to a synch variable has been performed, any other process’ next nonexclusive mode access to that synch variable may not be performed until it has performed with respect to that variable’s owner.
25Entry Consistency (3/3)Question: Is this a valid event sequence for entry consistency?YesQuestion: Why did P2 get NIL when R(y) is executed? Since P2 did not do an acquire before reading y, P2 may not read the latest.Question: What would be a convenient way of making entry consistency more or less transparent to programmers? By having the distributed system use and handle distributed shared objects (i.e., the system does an acquire on the object’s associated synch variable when a client access a shared distributed object).
26Summary of Consistency Models Strong consistency modelsModels do not use synch. operationsWeak consistency modelsModels use synch. operations
27Client-Centric Consistency Models Data-centric consistency models aim at providing the system-wide view on a data store.Client-centric consistency models are generally used for applications that lack simultaneous updates – i.e., most operations involve reading data.The following are very weak, client-centric consistency modelsEventual consistencyMonotonic readsMonotonic writesRead your writesWrites follow reads
28Client-Centric Consistency Models Goal: Show how we can perhaps avoid system-wide consistency, by concentrating on what specific clients want, instead of what should be maintained by servers.Background: Most large-scale distributed systems (i.e., databases) apply replication for scalability, but can support only weak consistency.DNS: Updates are propagated slowly, and inserts may not be immediately visible.News: Articles and reactions are pushed and pulled throughout the Internet, such that reactions can be seen before postings.Lotus Notes: Geographically dispersed servers replicate documents, but make no attempt to keep (concurrent) updates mutually consistent.WWW: Caches all over the place, but there need be no guarantee that you are reading the most recent version of a page.
29Eventual ConsistencySystems such as DNS and WWW can be viewed as applications of large scale distributed and replicated databases that tolerate a relatively high degree of inconsistencyThey have in common that if no updates take place for a long time, all replicas will gradually and eventually become consistentThis form of consistency is called eventual consistencyEventual consistency requires only that updates are guaranteed to propagate to all replicasEventual consistent data stores work fine as long as clients always access the same replica – what happens when different replicas are accessed?
30Consistency for Mobile Users Example: Consider a distributed database to which you have access through your notebook. Assume your notebook acts as a front end to the database.At location A you access the database doing reads and updates.At location B you continue your work, but unless you access the same server as the one at location A, you may detect inconsistencies:your updates at A may not have yet been propagated to Byou may be reading newer entries than the ones available at Ayour updates at B may eventually conflict with those at ANote: The only thing you really want is that the entries you updated and/or read at A, are in B the way you left them in A. In that case, the database will appear to be consistent to you.
32Client-centric Consistency For the mobile user example, eventual consistent data stores will not work properlyClient-centric consistency provides guarantees for a single client concerning the consistency of access to a data store by that clientNo guarantees are given concerning concurrent accesses by different clients
33Monotonic-Read Consistency A data store is said to be monotonic-read consistent if the following condition holds:If a process reads the value of a data item x, any successive read operation on x by that process will always return that same or a more recent value.That is, if a process has seen a value of x at time t, it will never see an older version of x at a later timeNotation: WS(xi[t]) is the set of write operations (at Li) that lead to version xi of x (at time t): WS(xi[t1];xj [t2]) indicates that it is known that WS(xi[t1]) is part of WS(xj[t2])Note: Parameter t is omitted from figures
34Monotonic Reads (1/2)Example: The read operations are performed by a single process P at two different local copies (L1 & L2) of the same data storeFigure (a) – a data store that is monotonic-read consistentP performs a read operation on x at L1, R(x1). Later, P performs a read operation on x at L2, R(x2)Figure (b) – a data store that is not monotonic-read consistentWhy not? Since only the write operations in WS(x2) have been performed at L2
35Monotonic Reads (2/2)Example 1: Automatically reading your personal calendar updates from different servers. Monotonic Reads guarantees that the user sees all updates, no matter from which server the automatic reading takes place.Example 2: Reading (not modifying) incoming mail while you are on the move. Each time you connect to a different server, that server fetches (at least) all the updates from the server you previously visited.
36Monotonic-Write Consistency A data store is said to be monotonic-write consistent if the following condition holds:A write operation by a process on a data item x is completed before any successive write operation on x by the same process.That is, a write operation on a copy of data item x is performed only if that copy has been brought up to date by means of any preceding write operations, which may have taken place on other copies of x.
37W(x1) has not been propagated to L2 Monotonic Writes (1/2)Figure (a) – a data store that is monotonic-write consistentP performs a write operation on x at L1, W(x1). Later, P performs a write operation on x at L2, W(x2)W(x2) requires that W(x1) is updated on L2 before it.Figure (b) – a data store that is not monotonic-write consistentWhy not?W(x1) has not been propagated to L2
38Monotonic Writes (2/2)Example 1: Updating a program at server S2, and ensuring that all components on which compilation and linking depends, are also placed at S2.Example 2: Maintaining versions of replicated files in the correct order everywhere (propagate the previous version to the server where the newest version is installed).
39Read-Your-Writes Consistency A data store is said to be read-your-writes consistent if the following condition holds:The effect of a write operation by a process on data item x, will always be seen by a successive read operation on x by the same process.That is, a write operation is always completed before a successive read operation by the same process, no matter where that read operation takes place.
40Read Your Writes (1/2)Figure (a) – a data store that is read-your-writes consistentP performs a write operation on x at L1, W(x1). Later, P performs a read operation on x at L2, R(x2).WS(x1;x2) states that W(x1) is part of WS(x2).Figure (b) – a data store that is not read-your-writes consistentW(x1) is left out of WS(x2). That is, the effects of the previous write operation by process P have not been propagated to L2.
41Read Your Writes (2/2)Example: Updating your Web page and guaranteeing that your Web browser shows the newest version instead of its cached copy.
42Writes Follow ReadsA data store is said to be writes-follow-reads consistent if the following condition holds:A write operation by a process on a data item x, following a previous read operation on x by the same process, is guaranteed to take place on the same or a more recent value of x that was read.That is, any successive write operation by a process on a data item x will be performed on a copy of x that is up to date with the value most recently read by that process.
43Writes Follow Reads (1/2) Figure (a) – a data store that is writes-follow-reads consistentP performs a read operation on x at L1, R(x1).The write operations that led to R(x1), also appear in the write set at L2, where P later performs W(x2).Figure (b) – a data store that is not writes-follow-reads consistentThe write operations that led to R(x1), did not appear in the write set at L2, before P later performs W(x2).
44Writes Follow Reads (2/2) Example: See reactions to posted articles only if you have the original posting (a read “pulls in” the corresponding write operation).
45Distribution Protocols Distribution protocols focus on distributing updates on replicasThe following are important design issuesReplica PlacementUpdate PropagationEpidemic Protocols
46Replica Placement (1/2)Model: We consider objects (and don’t worry whether they contain just data or code, or both)Distinguish different processes: A process is capable of hosting a replica of an object:Permanent replicas: Process/machine always having a replica (i.e., initial set of replicas)Server-initiated replica: Process that can dynamically host a replica on request of another server in the data storeClient-initiated replica: Process that can dynamically host a replica on request of a client (client cache)
48Server-Initiated Replicas Keep track of access counts per file, aggregated by considering server closest to requesting clientsNumber of accesses drops below threshold D drop fileNumber of accesses exceeds threshold R replicate fileNumber of access between D and R migrate file
49Update Propagation (1/3) Important design issues in update propagation:Propagate only notification/invalidation of update (often used for caches)Transfer data from one copy to another (distributed databases)Propagate the update operation to other copies (also called active replication)Observation: No single approach is the best, but depends highly on available bandwidth and read-to-write ratio at replicas.
50Update Propagation (2/3) Pushing updates: server-initiated approach, in which update is propagated regardless whether target asked for it or not.Pulling updates: client-initiated approach, in which client requests to be updated.
51Update Propagation (3/3) Observation: We can dynamically switch between pulling and pushing using leases: A contract in which the server promises to push updates to the client until the lease expires.Issue: Make lease expiration time dependent on system’s behavior (adaptive leases):Age-based leases: An object that hasn’t changed for a long time, will not change in the near future, so provide a long-lasting leaseRenewal-frequency based leases: The more often a client requests a specific object, the longer the expiration time for that client (for that object) will beState-based leases: The more loaded a server is, the shorter the expiration times become
53Basic idea: Assume there are no write–write conflicts: PrinciplesBasic idea: Assume there are no write–write conflicts:Update operations are initially performed at one or only a few replicasA replica passes its updated state to a limited number of neighborsUpdate propagation is lazy, i.e., not immediateEventually, each update should reach every replicaRead the theory of epidemics on pagesAnti-entropy: Each replica regularly chooses another replica at random, and exchanges state differences, leading to identical states at both afterwardsGossiping: A replica which has just been updated (i.e., has been contaminated), tells a number of other replicas about its update (contaminating them as well)
54System ModelWe consider a collection servers, each storing a number of objectsEach object O has a primary server at which updates for O are always initiated (avoiding write-write conflicts)An update of object O at server S is always time-stamped; the value of O at S is denoted VAL(O,S)T(O,S) denotes the timestamp of the value of object O at server S
55Anti-EntropyBasic issue: When a server S contacts another server S* to exchange state information, three different strategies can be followed:Push: S only forwards all its updates to S* :if T(O,S*) < T(O,S)then VAL(O,S*) VAL(O,S)Pull: S only fetches updates from S*then VAL(O,S) VAL(O,S*)Push-Pull: S and S* exchange their updates by pushing and pulling valuesObservation: if each server periodically randomly chooses another server for exchanging updates, an update is propagated in O(log(N)) time units.Question: Why is pushing alone not efficient when many servers have already been updated?
56GossipingBasic model: A server S having an update to report, contacts other servers. If a server is contacted to which the update has already propagated, S stops contacting other servers with probability 1/kIf s is the fraction of ignorant servers (i.e., which are unaware of the update), it can be shown that with many servers:Observation: If we really have to ensure that all servers are eventually updated, gossiping alone is not enough Combining anti-entropy with gossiping will solve this problem
57Deleting ValuesFundamental problem: We cannot remove an old value from a server and expect the removal to propagate. Instead, mere removal will be undone in due time using epidemic algorithmsSolution: Removal has to be registered as a special update by inserting a death certificateNext problem: When to remove a death certificate (it is not allowed to stay forever):Run a global algorithm to detect whether the removal is known everywhere, and then collect the death certificates (looks like garbage collection)Assume death certificates propagate in finite time, and associate a maximum lifetime for a certificate (can be done at risk of not reaching all servers)Note: it is necessary that a removal actually reaches all servers.Question: What’s the scalability problem here?
58Consistency Protocols Consistency protocol: describes the implementation of a specific consistency model. We will concentrate only on sequential consistency.Primary-based protocolsReplicated-write protocolsCache-coherence protocols
59Primary-Based Protocols (1/4) Primary-based, remote-write, fixed server:Example: Used in traditional client-server systems that do not support replication.
60Primary-Based Protocols (2/4) Primary-backup protocol:Example: Traditionally applied in distributed databases and file systems that require a high degree of fault tolerance. Replicas are often placed on same LAN.
61Primary-Based Protocols (3/4) Primary-based, local-write protocol:Example: Establishes only a fully distributed, non-replicated data store. Useful when writes are expected to come in series from the same client (e.g., mobile computing without replication)
62Primary-Based Protocols (4/4) Primary-backup protocol with local writes:Example: Distributed shared memory systems, but also mobile computing in disconnected mode (ship all relevant files to user before disconnecting, and update later on).
63Replicated-Write Protocols (1/3) Active replication: Updates are forwarded to multiple replicas, where they are carried out. There are some problems to deal with in the face of replicated invocations:
64Replicated-Write Protocols (2/3) Replicated invocations: Assign a coordinator on each side (client and server), which ensures that only one invocation, and one reply is sent:
65Replicated-Write Protocols (3/3) Quorum-based protocols: Ensure that each operation is carried out in such a way that a majority vote is established: distinguish read quorum and write quorum:Read the explanation on these examples on page 344.
66Example: Lazy Replication Basic model: Number of replica servers jointly implement a causal-consistent data store. Clients normally talk to front ends which maintain data to ensure causal consistency.
67Lazy Replication: Vector Timestamps VAL(i): VAL(i)[i] denotes the total number of write operations sent directly by a front end (client). VAL(i)[j] denotes the number of updates sent from replica #j.WORK(i): WORK(i)[i] total number of write operations directly from front ends, including the pending ones. WORK(i)[j] is total number of updates from replica #j, including pending ones.LOCAL(C): LOCAL(C)[j] is (almost) most recent value of VAL(j)[j] known to front end C (will be refined in just a moment)DEP(R): Timestamp associated with a request, reflecting what the request depends on.