1 6.4 Distribution Protocols Different ways of propagating/distributing updates to replicas, independent of the consistency model. First design issue.

1 6.4 Distribution Protocols Different ways of propagating/distributing updates to replicas, independent of the consistency model. First design issue for distributing data stores: deciding where, when, and by whom copies of the data store are to be placed.

2 6.4.1 Replica Placement The logical organization of different kinds of copies of a data store into three concentric rings.

3 Permanent Replicas Use web sites as an example: –Files replicated across a limited number of servers on a single local-area network –Mirroring to mirror sites geographically spread across the internet Distributed database –Database could be distributed and replicated across a cluster of workstations, where neither disks nor main memory are shared by processors –Database could be distributed, possibly replicated, across a number of geographically dispersed number of sites.

4 Server-Initiated Replicas Definition: copies of a data store that are to enhance performance and are created at the initiative of (the owner of) the data store. example: web hosting Problem: deciding when and where replicas should be created or deleted. Web hosting algorithm (Robinovich): two issues: replication can take place to reduce the load on a server specific files on a server can be migrated or replicated to servers in the proximity of requesting clients

5 Server-Initiated Replicas Counting access requests from different clients When the number of requests for a specific file F at server S drops below a deletion threshold, F can be removed from S. Must ensure at least one copy of each file continues to exist. When the number of requests for a specific file F at server S is over a replication threshold, F can be replicated in a server with many requests. If the number of requests is between the above two thresholds, F can only be migrated. The chosen server is the one with more than half of the total requests. Used mostly for read-only copies close to clients, whereas permanent replicas are used for backup or as the only updateable replica to guarantee consistency.

6 Client-Initiated Replicas Also known as caches. In principle, managing the cache is left entirely to the client. However, client may rely on the data store to inform it when cached data has become stale. Caches are generally kept for a limited amount of time to prevent using stale data, or to make room for other data. To improve the number of cache hits, caches can be shared between clients. Placement of client caches is simple: –on the same as the the client –on a machine shared by clients on the same local area network –extra levels of caching may be introduced

7 6.4.2 Update Propagation State versus Operations –What is actually to be propagated: Propagate only a notification of an update: what invalidation protocols do. When an operation on an invalidated copy is requested, that copy needs to be updated first, depending on the specific supported consistency model. –Use little bandwidth. –Best when the read-to-write ratio is low. Transfer data from one copy to another. –Used when the read-to-write ratio is high –Also possible to log the changes and transfer only those logs Propagate the update operation to other copies: also called active replication. When the parameters are small, this saves bandwidth. However, more processing power may be required by each replica.

8 Pull versus Push Protocols A comparison between push-based and pull-based protocols in the case of multiple client, single server systems. IssuePush-basedPull-based State of serverList of client replicas and cachesNone Messages sentUpdate (and possibly fetch update later)Poll and update Response time at client Immediate (or fetch-update time)Fetch-update time maintain high degree of consistency, useful when read-to-write ratio is high

9 Unicasting versus Multicasting In unicasting, when a server sends its updates to N other servers, it does so by sending N separate messages. With multcasting, the underlying network takes care of sending a message efficiently to multiple receivers. Multicasting can be efficiently combined with a push-based approach to propagate updates. With a pull-based approach, unicasting may be more efficient.

10 6.4.3 Epidemic Protocols Epidemic algorithms do not solve any update conflicts. Instead, their only concern is propagating updates to all replicas in as few messages as possible. Assumes all updates for a specific data item are initiated at a single server, to avoid write- write conflict.

11 Epidemic Protocols A popular propagation model is that of anti-entropy: a server P picks another server Q at random, and subsequently exchange updates with Q in one of three approaches: –P only pushes its own updates to Q: a bad choice if many servers are infective. –P only pulls in new updates from Q: useful when many servers are infective. –P and Q send updates to each other Rumor spreading (gossiping): If server P has just updated for data x, it contacts an arbitrary server Q and tries to push the update to Q. If Q has already updated, then with a probability 1/k, P may lose interest in spreading the update any further. –The fraction s of servers that will remain ignorant of the update satisfies

12 Removing Data Epidemic algorithms are good for spreading updates in eventual-consistent data stores. However, spreading the deletion of a data item is hard. Trick: record the deletion as another update, and keep a record of that deletion. The recording of a deletion is done by spreading death certificates. Death certificates should be eventually cleaned up. One way is use timestamp. If it can be assumed that updates propagate to all servers within a known finite time, the death certificates can be removed after the maximum propagation time has elapsed. To provide hard guarantee, a very few servers maintain dormant death certificates that are never thrown away.

13 Consistency Protocols: Primary-based Remote-Write Protocols (1) Primary-based remote-write protocol with a fixed server to which all read and write operations are forwarded. After finishing write, each backup server performs the update too. It may take a long time before the updating process is allowed to continue. (see next)

14 Primary-backup Protocol If we want to change the write to non-blocking, then fault tolerance will be a problem.

15 Local-Write Protocols (1) Primary-based local-write protocol in which a single copy is migrated between processes. Need to keep track of where each data item currently is.

16 Local-Write Protocols (2) Primary-backup protocol in which the primary migrates to the process wanting to perform an update. This could be applied to mobile computers operated in disconnected mode.

17 Replicated-Write Protocols Active Replication (1) The problem of replicated invocations: 1.Operations need to be carried out in the same order everywhere 2.Replication invocations

18 Active Replication (2) a)Forwarding an invocation request from a replicated object. b)Returning a reply to a replicated object.

19 Quorum-Based Protocols Three examples of the voting algorithm: a)A correct choice of read and write set b)A choice that may lead to write-write conflicts c)A correct choice, known as ROWA (read one, write all) Constraints on and :

20 Cache Coherence Protocols Two criteria to classify caching protocols: 1.Coherence detection strategy: when inconsistencies are actually detected static: compiler analysis dynamic: when during a transaction the detection is done 1.The transaction cannot proceed to use the cached version until its consistency has been validated 2.Let the transaction proceed while verification is taking place 3.Verify only when the transaction committed 2.Coherence enforcement strategy: how caches are kept consistent 1.Disallow shared data to be cached 2.Shared data can be cached: 1.Let the server send an invalidation to all caches whenever a data item is modified. 2.Simply propagate the update.

21 Cache Coherence Protocols What happens when a process modifies cached data –When read-only caches are used, update operations can be performed only by the servers, which subsequently follow some distribution protocol to ensure that updates are propagated to caches. –To allow clients to directly modify the cached data, and forward the update to the servers. This is followed in write- through caches. –Write-back cache: delay the propagation of updates by allowing multiple writes to take place before informing the servers.

1 6.4 Distribution Protocols Different ways of propagating/distributing updates to replicas, independent of the consistency model. First design issue.

Similar presentations

Presentation on theme: "1 6.4 Distribution Protocols Different ways of propagating/distributing updates to replicas, independent of the consistency model. First design issue."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 6.4 Distribution Protocols Different ways of propagating/distributing updates to replicas, independent of the consistency model. First design issue.

Similar presentations

Presentation on theme: "1 6.4 Distribution Protocols Different ways of propagating/distributing updates to replicas, independent of the consistency model. First design issue."— Presentation transcript:

Similar presentations

About project

Feedback