Presentation is loading. Please wait.

Presentation is loading. Please wait.

Replication (1). Topics r Why Replication? r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric consistency.

Similar presentations


Presentation on theme: "Replication (1). Topics r Why Replication? r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric consistency."— Presentation transcript:

1 Replication (1)

2 Topics r Why Replication? r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric consistency m Client-centric consistency r We will examine consistency protocols which describe an implementation of a specific consistency model. r Other Implementation Issues r Examples

3 Readings r Van Steen and Tanenbaum: 6.1, 6.2 and 6.3, 6.4 r Coulouris: 11,14

4 Why Replicate? r Replication refers to the maintenance of copies at multiple site r Reliability m If one replica is unavailable or crashes, use another m Avoid single points of failure r Performance m Placing copies of data close to the processes using them can improve performance through reduction of access time. m If there is only one copy, then the server could become overloaded.

5 Common Replication Examples r DNA naming service r Web browsers often locally store a copy of a previously fetched web page. m This is referred to as caching a web page. r Replication of a database r Replication of game state

6 Replication Problem r Multiple copies may lead to consistency problems. r Whenever a copy is modified, that copy becomes different from the rest. r Modifications have to be carried out on all copies to ensure consistency. r The type of application has an impact on the consistency requirements needed and thus on the implementation.

7 Consistency Model r Some applications (e.g., banking) require m That update operations are performed in the same order at each copy. m This is referred to as sequential consistency. m Possible Implementation: Using Lamport’s clocks r Other applications (e.g., bulletin board) require m That if one update, U 1, causes another update, U 2, to occur then U 1 should be executed before U 2 at each copy. m This is referred to as causal consistency m Possible Implementation: Using vector clocks

8 Consistency Model r Observe that although there is replication the type of application indicates the type of consistency model to be used. r A consistency model describes the rules to be used in updating replicated data r There are more consistency models than sequential and causal. r Other Consistency Models: m FIFO m Strict

9 FIFO Consistency r Writes done by a single process are seen by all other processes in the order in which they were issued r … but writes from different processes may be seen in a different order by different processes. r i.e., there are no guarantees about the order in which different processes see writes, except that two or more writes from a single source must arrive in order.

10 FIFO Consistency r Caches in web browsers m All updates are updated by page owner. m No conflict between two writes m Note: If a web page is updated twice in a very short period of time then it is possible that the browser doesn’t see the first update. r Implementation: m Each process adds the following to an update message: (process id, sequence number) m Each other process applies the update messages in the order received from a single process.

11 Strict Consistency r Strict consistency is defined as follows: m Read is expected to return the value resulting from the most recent write operation m Assumes absolute global time m All writes are instantaneously visible to all r Suppose that process p i updates the value of x to 5 from 4 at time t 1 and multicasts this value to all replicas m Process p j reads the value of x at t 2 (t 2 > t 1 ). m Process p j should read x as 5 regardless of the size of the (t 2 -t 1 ) interval.

12 Strict Consistency r What if t 2 -t 1 = 1 nsec and the optical fibre between the host machines with the two processes is 3 meters. m The update message would have to travel at 10 times the speed of light m Not allowed by Einsten’s special theory of relativity. m Can’t have strict consistency

13 Implementation Options: Sequential Consistency r We saw how to use Lamport’s logical clocks for sequential consistency. r Another option is to have a centralized processor that is a sequencer.

14 Implementation Options: Sequential Consistency r We saw how to use Lamport’s logical clocks for sequential consistency. r Another option is to have a centralized processor that is a sequencer. r Each update request it sent to the sequencer which m Assigns the request a unique sequence number m Update request is forwarded to each replica m Operations are carried out in the order of their sequence number

15 Implementation Options: Sequential Consistency r The use of a sequencer also does not solve the scalability problem. m It may become a performance bottleneck. m What if it goes down? r A combination of Lamport timestamps and sequencers may be necessary. r The approach is summarized as follows: m Each process has a unique identifier, p i, and keeps a sent message counter c i. The process identifier and message counter uniquely identify a message. m Active processes (or a sequencer) keep an extra counter: t i. This is called the ticket number. A ticket is a triplet (p i, t i, (p j, c j )). m All other processes are passive

16 Implementation Options: Sequential Consistency r Approach Summary (cont) m Passive processes (non-sequencer) send their messages to their sequencer. m Lamport’s totally ordered multicast algorithm is used among the sequencers to determine the order of update operations. m When an operation is allowed, each sequencer sends the ticket to its associated passive processes. It is assumed that the passive process receives these tickets in the order sent.

17 Implementation Options: Sequential Consistency r Approach Summary (cont) m If a sequencer terminates abnormally, then one of the passive processes associated with it can become the new sequencer. m An election algorithm may be used to choose the new sequencer.

18 Implementation Options: Sequential Consistency r Let’s say that we have 6 processes: p 1,p 2,p 3,p 4,p 5,p 6 r Assume that p 1,p 2 are sequencers; p 3,p 4 are associated with p 1 and p 5,p 6 are associated with p 2 r Let’s say that p 3 sends a message which is identified by (p 3, 1). r p 1 generates a ticket as follows: (p 1, 1, (p 3, 1)) r The ticket number is generated using the Lamport clock algorithm. Ticket number

19 Implementation Options: Sequential Consistency r Let’s say that p 5 sends a message which is identified by (p 5, 1). r p 2 generates a ticket as follows: (p 2, 1, (p 5, 1)) r Which update gets done first? Basically, p 1,p 2 will apply Lamport’s algorithm for totally ordered multicast. r When an update operation is allowed to proceed, the sequencers send messages to their associated processes.

20 Data-Centric Consistency Models r The consistency models just discussed are called data-centric consistency models. r Assumptions: m Concurrently processes may be simultaneously updating m Updates need to be propagated quickly.

21 Eventual Consistency r In the banking example an account can have many updates by different sources e.g., person at ATM, bank adding interest; Updates should be “immediate” r Many applications: One or few processes perform updates r Example: DNS m DNS name space is divided into domains. m Each domain has its own naming authority m Only that authority is allowed to update its part of the name space e.g., change the IP address associated with a host name. m This implies that there is no write-write conflict m Does the update have to be done immediately? m No. m Can propagate an update in a lazy fashion i.e., Often acceptable to propagate an update only after some time has passed

22 Eventual Consistency r Example: WWW m Web pages are updated by a single authority. m Web pages are cached by browsers for efficiency m The cached page that is returned to the requesting client may be an older version compared to the one available at the actual web server. m This inconsistency is usually acceptable. r Some applications can tolerate relatively high inconsistency. r Eventual consistency requires only that updates are guaranteed to propagate to all replicas.

23 Eventual Consistency The principle of a mobile user accessing different replicas of a distributed database.

24 Eventual Consistency r The mobile user accesses the database by connecting to one of the replicas in a transparent way. r The application running on the user’s portable computer is unaware (ideally) on which replica it is actually operating. r Assume the user performs several update operations and then disconnects again. r Later the user accesses the database again, possibly after moving to a different location or by using a different access device. The user may be connected to a different replica. r What if the updates have not propagated? Could be confusing to the user.

25 Client-Consistency Models r Often there are some constraints placed on eventual consistency. r These constraints help define client- consistency models.

26 Client-Consistency Models r Monotonic reads: m If a process reads a value of data item x, the subsequent reads by the same process will return the same value or a later value. m Example Consider a distributed e-mail database In such a database, each user’s mailbox may be distributed and replicated across multiple machines. Mail can be inserted in a mailbox at any location. Updates are propagated in a lazy (i.e., on demand) fashion. Assume that reads don’t change the mailbox. Suppose a user reads their e-mail in Vancouver and then flies to Toronto and reads their e-mail. A monotonic read guarantees that the messages that were in the mailbox in Vancouver will also be in the mailbox in Toronto.

27 Client-Consistency Models r Monotonic writes m A write operation on data item x is completed before any subsequent writes by the same process on data item x. m Example: Updating a software library Update may consist of replacing one or more functions resulting in a new version. Updates performed on a copy of the library should be able to assume that all proceeding updates have been performed first.

28 Client-Consistency Models r Read-Your-Writes m A write operation by a process on data item x will always be seen by a successive read operation on x by the same process m The absence of this consistency is seen in the following examples. m Example: Updating Web HTML pages Cached web pages are still read even though that web page has been updated. m Example: Password updates for digital library This may occur at one site, but not immediately propagated to a site where the account/password is actually needed

29 Client-Consistency Models r Write-Follows-Reads  A write operation by a process on data item x following a previous read operation on x by the same process is guaranteed to see the same or more recent value of x

30 Implementing Client-Centric Models r Globally unique ID per write operation m Assigned by the initiating server m Global IDs can be generated locally. m A server is required to log the write operation so that it can be replayed at another server. r For each client, we keep track of two sets of write identifiers: m Read set Write IDs relevant to client’s read operations m Write set IDs of writes performed by client r Major performance issue: m Size of read/write sets

31 Implementing Client-Centric Models r Monotonic read: m When a client issues a read, the server is given the client’s read set to check whether all the identified writes have taken place locally If not, the server contacts others to ensure that it is brought up- to-date m After the read, the client’s read set is updated with the server’s “relevant” writes r Monotonic write: m When a client issues a write, the server is given the client’s write set … to ensure that all specified writes have been applied (in-order) m The write operation’s ID is appended to client’s write set

32 Implementing Client-Centric Models r Read-your-writes: m Before serving a read request, the server fetches (from other servers) all writes in the client’s write set r Writes-follow-reads: m Server is brought up-to-date with the writes in the client’s read set m After write, the new ID is added to the client’s write set, along with the IDs in the read set … as these have become “relevant” for the write just performed

33 Impact of Mobility r Mobility suggests that a user may be disconnected. r Assume that a user of a mobile device has downloaded their calendar from their workstation. r User’s device is disconnected. r User makes changes to the calendar on the mobile device. r Secretary makes changes to the calendar on the workstation r When the user is connected the calendar on the user’s device and on the user’s workstation should become the same. r Some schemes have the user’s device by the primary and the workstation be a backup. m This suggests that the calendar on the user’s device is considered the most recent.

34 Other Important Implementation Issues r Important issues in implementation includes the following: m Placement and nature of replicas m Distributing updates

35 Replica Placement r Permanent m A process/machine always has a replica. Example: Mirroring of a web site r Server-Initiated m Processes that can dynamically host a replica on request of another server. r Client-Initiated m Processes that can dynamically host a replica on request of a client. Example: Web Caches

36 Server-Initiated Replicas r Consider a web server placed in Toronto. r Under normal situations, the server can handle incoming requests easily; it is predicted that in a couple of a days there will be sudden burst of requests. r It may be worthwhile to install a number of temporary replicas in region where requests are coming from.

37 Server-Initiated Replicas r The ability to optimize the dynamic placement of replicas is of special interest to web hosting services. m ISPs pay a web hosting company (sometimes called an access-centric content distribution network) to serve popular content from caches close to the ISPs’ subscribers. m This model assumes that storage is cheaper than bandwidth, and that customers will not hesitate to move to other ISPs if they perceive their current ISP to be slow.

38 Server-Initiated Replicas r Example Heuristic: m Keep track of access counts per file. m Number of accesses drops below some threshold value D. This implies that file can be dropped. m The number of accesses exceeds a threshold R. This implies that the file should be replicated.

39 Client-Initiated Replicas r Created at the initiative of clients. r Known as caches r In essence, a cache is a local storage facility that is used by a client to temporarily store a copy of the data it has just requested. r Client caches are used to improve access times to data. r Data is generally kept in a cache for a limited amount of time e.g., to prevent extremely stale data from being used or make room for other data. r Cache placement can be local to a client’s machine or in a location that is easily accessible by other machines in the client’s organization.

40 Update Propagation r Update operations are generally initiated by a client and subsequently forwarded to one of the copies. r There are a number of design issues to consider. r State or Operation? m An important design issue concerns what is actually to be propagated. m Three Possibilities: Notification of an update New copy of data Copy of operation m Trade bandwidth for processing

41 Update Propagation r Push vs Pull m Another design issue is whether updates are pulled or pushed. m Push by server Server must know replicas Client immediately updated m Pull by client Client must poll or delay response when item requested

42 Update Propagation r Push vs. Pull (cont) m Leases We can dynamically switch between pulling and pushing using leases: A contract in which the server promises to push updates to the client until the lease expires. Age-based leases: An object that hasn’t changed for a long- time, will not change in the near future, so provide a long- lasting lease. Renewal-frequency based leases: The more often a client requests a specific object, the longer the expiration time for that client (for that object) will be. State-based leases: The more loaded a server is, the shorter the expiration times become.

43 Consistency Requirements in Applications r We have looked at several consistency models and possible implementations. r There are many more out there that are a variation of the models described. r It is important to understand the consistency requirements of the application domain. r Let’s look at some Internet applications.

44 Consistency Requirements for Applications r Bulletin board m Replicated message posting service m As discussed earlier, causal order is needed. Some bulletin boards may also want total order. m There may be a requirement on how fast these updates should be. r KaZaa m Order of updates doesn’t matter since downloading a file is a commutative operation i.e., it doesn’t matter if song a is downloaded before song b or if song b is downloaded before song a. m Some would say is that what is important is eventually all sites could have the same songs.

45 Consistency Requirements for Applications r Chat Service m Chat messages require causal order for discussions to make sense. r Games m Players’ moves in a game must be delivered in the same order to all participants for fairness. r In both these cases, timeliness is important. r A centralized solution results in a performance bottleneck. r Games sometimes guess at moves or the position of objects on the game board m E.g., instead of sending and receiving messages for the position of a object, the software predicts what the positions would be.

46 Consistency Requirements for Applications r Airline reservation m This is representative of replicated e- commerce services that accept inquiries (searches) and purchases orders on a catalog. m A measurement of consistency is used. This is the percentage of requests that access inconsistent results. m Example: A user may observe an available seat when in fact the set has been booked at another replica. m Isn’t this handled by using one of the approaches to providing total order. m Yes, but if a small violation of consistency is tolerated we can achieve better performance.

47 Consistency Requirements for Applications r Airlines reservation (cont) m Consistency requirements change dynamically. m Example: The cost of a transaction that must be rolled back is fairly small when a flight is empty but grows was the flight fills. Why? One can likely find an alternate seat on the same flight. Requests when the flight is close to full may require a replica to be more aggressive in enforcing sequential consistency.


Download ppt "Replication (1). Topics r Why Replication? r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric consistency."

Similar presentations


Ads by Google