Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chen ying 1 Taming aggressive replication in the Pangaea wide-area file system Authors: Yasushi Saito, Christos Karamanolis, Magnus Karlsson, Mallik Mahalingam.

Similar presentations


Presentation on theme: "Chen ying 1 Taming aggressive replication in the Pangaea wide-area file system Authors: Yasushi Saito, Christos Karamanolis, Magnus Karlsson, Mallik Mahalingam."— Presentation transcript:

1 chen ying 1 Taming aggressive replication in the Pangaea wide-area file system Authors: Yasushi Saito, Christos Karamanolis, Magnus Karlsson, Mallik Mahalingam Presented by: Ying Chen

2 chen ying 2 Agenda  Introduction  Aggressive replication  A structural overview  Replica set management  Propagating updates  Failure recovery  System evaluation  Conclusion  Q&A

3 chen ying 3 Introduction What is Pangaea? Pangaea is a wide-area file system that enables ad-hoc collaboration in multi- national corporations or in distributed groups of users. Pangaea is a wide-area file system that enables ad-hoc collaboration in multi- national corporations or in distributed groups of users.

4 chen ying 4 Introduction  Goals Speed Speed Availability and autonomy Availability and autonomy Network economy Network economy  Key technique Aggressive replication (pervasive replication) Aggressive replication (pervasive replication) Pangaea aggressively creates a replica of a file or directory whenever and wherever it is accessed. Pangaea aggressively creates a replica of a file or directory whenever and wherever it is accessed. replicas exchange updates among themselves in a p2p fashion. replicas exchange updates among themselves in a p2p fashion.

5 chen ying 5 Aggressive replication  Main advantages of aggressive replication Provides fault tolerance, stronger for popular files; Provides fault tolerance, stronger for popular files; Hides network latency; Hides network latency; Supports disconnected operations by containing a user’s working set in a single server. Supports disconnected operations by containing a user’s working set in a single server.  Challenges: Keeping track of a large number of files and replicas in a decentralized way; Keeping track of a large number of files and replicas in a decentralized way; Propagating updates reliably yet efficiently. Propagating updates reliably yet efficiently. How to implement pervasive replication? How to implement pervasive replication?

6 chen ying 6 Aggressive replication Strategies to implement aggressive replication  Graph-based replica management A sparse, yet strongly connected and randomized graph of replicas for each file; A sparse, yet strongly connected and randomized graph of replicas for each file; Used both to propagate updates and to discover other replicas during replica addition and removal. Used both to propagate updates and to discover other replicas during replica addition and removal.  Optimistic replica coordination Let updates be issued on any replicas at any time; Let updates be issued on any replicas at any time; Maximizes the availability, but sacrifices the consistency. Maximizes the availability, but sacrifices the consistency.

7 chen ying 7 A structural overview  Structure of a server NFS protocol handler Replication engine Pangaea server log membership I/O request (application) NFS client inter-nodecommunication Figure 1: The structure of the Pangaea server. User space Kernel NFS protocol handler receives requests from applications, updates local replicas, and generates requests for the replication engine. Replication engine accepts requests from the NFS protocol handler and the replication engine running on other nodes. It creates, modifies, or removes replicas, and forwards requests to other nodes if necessary. Log module implements transaction- like semantics for local disk updates via redo logging. The server logs all the replica-update operations using this service, allowing them to survive crashes. Membership module maintains the status of other nodes, including their liveness, available disk space, the locations of root-directory replicas, the list of regions in the system, the set of node in each region, and a RTT estimate between every pair of regions.

8 chen ying 8 A structural overview  Structure of a file system /joe /joe/foo Figure 2: An example of the Pangaea file system. Peer edge Bronze replica Gold replica Downlinks Backpointer Notes : Pangaea has two types of replicas: gold and bronze. They can both be read and written by users at any time, and they both run an identical update-propagation protocol. Gold replicas play an additional role in maintaining the hierarchical name space. Notes : Each replica stores a backpointer that indicates its location in the file-system name space. A backpointer includes the parent directory ’ s ID and the file ’ s name within the directory. Notes :  Node = server;  Replicates data at the granularity of files;  Directories are treated as files with special contents.

9 chen ying 9 Replica set management  File creation The creation of gold replicas; The creation of gold replicas; The creation of backpointers and downlinks. The creation of backpointers and downlinks.  Replica addition Find the gold replica in the directory entry during the name-space lookup; Find the gold replica in the directory entry during the name-space lookup; Perform short-cut replica creation to transfer data; Perform short-cut replica creation to transfer data; Gold replicas act as starting points; Gold replicas act as starting points; Integrate the new copy into the file’s replica graph. Integrate the new copy into the file’s replica graph.

10 chen ying 10 Replica set management  Bronze replica removal Server sends notices to the replica’s graph neighbors; Server sends notices to the replica’s graph neighbors; Each neighbor initiates a random walk to establish a replacement edge with another live replica. Each neighbor initiates a random walk to establish a replacement edge with another live replica.  Name-space containment For every replica of a file, its parent directories should be also replicated on the same node; For every replica of a file, its parent directories should be also replicated on the same node; It simplifies the conflict resolution of directory op, and supports disconnected op; It simplifies the conflict resolution of directory op, and supports disconnected op; But this requirement increases the storage overhead by 1.5% to 25%. But this requirement increases the storage overhead by 1.5% to 25%.

11 chen ying 11 Propagating updates  Optimistic replication brings 3 challenges: Efficient and reliable update propagation; Efficient and reliable update propagation; Handling concurrent updates; Handling concurrent updates; The lack of strong consistency guarantees. The lack of strong consistency guarantees.  Solutions to these challenges: Optimizations for efficient update Optimizations for efficient update Conflict resolution Conflict resolution Controlling replica divergence Controlling replica divergence

12 chen ying 12 Optimization  Delta propagation Pangaea propagates only a small, semantic description of the change, called delta; Pangaea propagates only a small, semantic description of the change, called delta; Each delta carries two timestamps. Each delta carries two timestamps.  Harbingers Harbinger is a small message that only contains the timestamps of the update; Harbinger is a small message that only contains the timestamps of the update; Harbinger is flooded along the graph edges; the update body is sent only when requested by other nodes; Harbinger is flooded along the graph edges; the update body is sent only when requested by other nodes;

13 chen ying 13 Optimization  Exploiting physical topology A C F ED B CF ED B Pangaea dynamically builds a spanning tree whose shape closely matches the physical network topology. This can extremely reduces the use of wide-area networks.

14 chen ying 14 Conflict resolution  Conflicts on the contents of a regular file. To solve the conflicts, we have 2 options: The “last-writer-wins” rule; The “last-writer-wins” rule; Fixing the conflict by user manually. Fixing the conflict by user manually.  But, conflicts regarding file attributes or directory entries are more difficult to handle; they fall into 2 categories: Conflict between 2 directory-update operations; Conflict between 2 directory-update operations; Conflict between “rmdir” and any other operation. Conflict between “rmdir” and any other operation.

15 chen ying 15 Conflict resolution 1. File /foo and directories /alice and /bob are initially replicated on replica sets { A, B }, { A, C }, and { B, D }, respectively. 2. Alice on node A does mv /foo /alice/foo1. 3. Simultaneously, Bob on node B does mv /foo /bob/foo2. Example 1: example of rename-rename conflict. 1. An empty directory /foo is replicated on nodes { A, B }. 2. Alice on node A creates file /foo/bar. 3. Bob on node B does rmdir /foo. Example 2: example of rmdir-update conflict.

16 chen ying 16 Conflict resolution   Solution: Pangaea lets the “child” file have the final say on the conflict resolution using the “last-writer-wins” rule; Implement directory operations as a change to the file’s backpointer(s);

17 chen ying 17 Failure recovery  Recovering from temporary failures Majority of failures are temporary; Majority of failures are temporary; The goal is to reduce the recovery cost; The goal is to reduce the recovery cost; A node retries logged updates upon reboot or after it detects another node’s recovery. A node retries logged updates upon reboot or after it detects another node’s recovery.  Recovering from permanent failures The goal is to clean all data structures associated with the failed node so that the system runs as if the node had never existed in the first place; The goal is to clean all data structures associated with the failed node so that the system runs as if the node had never existed in the first place; Permanent failures are handled by a garbage collection module. Permanent failures are handled by a garbage collection module.

18 chen ying 18 System evaluation Performance of personal workload in WANs

19 chen ying 19 System evaluation The average time needed to read a new file in a collaborative environment.

20 chen ying 20 System evaluation Availability analysis using a file-system trace.

21 chen ying 21 Conclusions  Pangaea is a wide-area file system; it assumes trusted servers.  3 design principles: Pervasive replication to provide low-access latency and high availability; Pervasive replication to provide low-access latency and high availability; Randomized graph-based replica management that adapts to changes in the system and conserves WAN bandwidth; Randomized graph-based replica management that adapts to changes in the system and conserves WAN bandwidth; Optimistic consistency that allows users to access data at any time, from anywhere. Optimistic consistency that allows users to access data at any time, from anywhere.  In heterogeneous environments, Pangaea outperforms existing systems in 3 aspects: access latency, efficient usage of WAN bandwidth, and file availability.

22 chen ying 22 Q & A  Pangaea shares many goals ---- decentralization, availability and autonomy ---- with recent p2p data sharing systems, such as PAST. These p2p systems build flat distributed tables using randomization techniques. Could Pangaea also use this method? No. Pangaea should maintain a graph of replicas explicitly. Because in Pangaea: No. Pangaea should maintain a graph of replicas explicitly. Because in Pangaea:  Replicas are placed by user activity, not by randomization;  Files encounter frequent updates and are structured hierarchically.

23 chen ying 23 Q & A  Why could Harbinger algorithm shrink the effective window of replica inconsistency? Harbinger-propagation delay is independent of the actual update size, so the chance of a user seeing stale file contents is greatly reduced. Harbinger-propagation delay is independent of the actual update size, so the chance of a user seeing stale file contents is greatly reduced.

24 chen ying 24 Q & A  Conflict resolution using backpointers requires that each file can perform a (local or remote) update to a replica of the directory that the backpointer refers to. One approach, adopted in Pangaea's earlier implementation, is to embed pointers to (some of) the replicas of the parent directory in the backpointer and modify the parent directory using remote procedure calls. What’s the problem of this design?  This design turned out to be unwieldy: the backpointer is used to initiate a change in the directory, but its directory links must be changed when the directory’s replica set changes. Because of this circular control structure, we could not easily keep the information of the backpointer and the parent directory properly synchronized.

25 chen ying 25 Thank you!


Download ppt "Chen ying 1 Taming aggressive replication in the Pangaea wide-area file system Authors: Yasushi Saito, Christos Karamanolis, Magnus Karlsson, Mallik Mahalingam."

Similar presentations


Ads by Google