Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Systems CS 15-440 Programming Models- Part V Replication and Consistency- Part I Lecture 18, Oct 29, 2014 Mohammad Hammoud 1.

Similar presentations


Presentation on theme: "Distributed Systems CS 15-440 Programming Models- Part V Replication and Consistency- Part I Lecture 18, Oct 29, 2014 Mohammad Hammoud 1."— Presentation transcript:

1 Distributed Systems CS 15-440 Programming Models- Part V Replication and Consistency- Part I Lecture 18, Oct 29, 2014 Mohammad Hammoud 1

2 Today…  Last Session:  Programming Models – Part IV: Pregel  Today’s Session:  Programming Models – Part V: GraphLab  Replication and Consistency- Part I: Motivation, Overview & Types of Consistency Models  Announcements:  Project 3 is now posted. It is due on Wednesday Nov 12, 2014 by midnight  PS4 is now posted. It is due on Saturday Nov 15, 2014 by midnight  We will practice more on MPI tomorrow in the recitation 2

3 Objectives Discussion on Programming Models Why parallelizing our programs? Parallel computer architectures Traditional Models of parallel programming Types of Parallel Programs Message Passing Interface (MPI) MapReduce, Pregel and GraphLab Last 4 Sessions Cont’d

4 Objectives Discussion on Programming Models Why parallelizing our programs? Parallel computer architectures Traditional Models of parallel programming Types of Parallel Programs Message Passing Interface (MPI) MapReduce, Pregel and GraphLab

5 The GraphLab Analytics Engine 5 GraphLab Motivation & Definition The Programming Model Input, Output & Components The Architectural Model Fault- Tolerance The Computation Model

6 Motivation for GraphLab  There is an exponential growth in the scale of Machine Learning and Data Mining (MLDM) algorithms  Designing, implementing and testing MLDM at large-scale are challenging due to:  Synchronization  Deadlocks  Scheduling  Distributed state management  Fault-tolerance  The interest on analytics engines that can execute MLDM algorithms automatically and efficiently is increasing  MapReduce is inefficient with iterative jobs (common in MLDM algorithms)  Pregel cannot run asynchronous problems (common in MLDM algorithms) 6

7 What is GraphLab?  GraphLab is a large-scale graph-parallel distributed analytics engine  Some Characteristics: In-Memory (opposite to MapReduce and similar to Pregel) High scalability Automatic fault-tolerance Flexibility in expressing arbitrary graph algorithms (more flexible than Pregel) Shared-Memory abstraction (opposite to Pregel but similar to MapReduce) Peer-to-peer architecture (dissimilar to Pregel and MapReduce) Asynchronous (dissimilar to Pregel and MapReduce) 7

8 The GraphLab Analytics Engine 8 GraphLab Motivation & Definition The Programming Model Input, Output & Components The Architectural Model Fault- Tolerance The Computation Model

9  GraphLab assumes problems modeled as graphs  It adopts two phases, the initialization and the execution phases Input, Graph Flow and Output 9 Initialization Phase GraphLab Execution Phase Distributed File system (MapReduce) Graph Builder Distributed File system Raw Graph Data Parsing + Partitioning Atom Collection Index Construction Atom Index Atom File ClusterDistributed File system TCP RPC Comms Atom Index Atom File Monitoring + Atom Placement GL Engine

10 Components of the GraphLab Engine: The Data-Graph  The GraphLab engine incorporates three main parts: 1.The data-graph, which represents the user program state at a cluster machine 10 Data-Graph Vertex Edge

11  The GraphLab engine incorporates three main parts: 2.The update function, which involves two main sub-functions: 2.1- Altering data within a scope of a vertex 2.2- Scheduling future update functions at neighboring vertices v v SvSv The scope of a vertex v (i.e., S v ) is the data stored in v and in all v’s adjacent edges and vertices Components of the GraphLab Engine: The Update Function

12  The GraphLab engine incorporates three main parts: 2.The update function, which involves two main sub-functions: 2.1- Altering data within a scope of a vertex 2.2- Scheduling future update functions at neighboring vertices Algorithm: The GraphLab Execution Engine The update function Schedule v

13 Components of the GraphLab Engine: The Update Function  The GraphLab engine incorporates three main parts: 2.The update function, which involves two main sub-functions: 2.1- Altering data within a scope of a vertex 2.2- Scheduling future update functions at neighboring vertices CPU 1 CPU 2 e e f f g g k k j j i i h h d d c c b b a a b b i i h h a a i i b b e e f f j j c c Scheduler The process repeats until the scheduler is empty

14 Components of the GraphLab Engine: The Sync Operation  The GraphLab engine incorporates three main parts: 3.The sync operation, which maintains global statistics describing data stored in the data-graph  Global values maintained by the sync operation can be written by all update functions across the cluster machines  The sync operation is similar to Pregel’s aggregators  A mutual exclusion mechanism is applied by the sync operation to avoid write-write conflicts  For scalability reasons, the sync operation is not enabled by default

15 The GraphLab Analytics Engine 15 GraphLab Motivation & Definition The Programming Model Input, Output & Components The Architectural Model Fault- Tolerance The Computation Model

16 The Architectural Model  GraphLab adopts a peer-to-peer architecture  All engine instances are symmetric  Engine instances communicate together using Remote Procedure Call (RPC) protocol over TCP/IP  The first triggered engine has an additional responsibility of being a monitoring/master engine  Advantages:  Highly scalable  Precludes centralized bottlenecks and single point of failures  Main disadvantage:  Complexity

17 The GraphLab Analytics Engine 17 GraphLab Motivation & Definition The Programming Model Input, Output & Components The Architectural Model Fault- Tolerance The Computation Model

18 The Programming Model  GraphLab offers a shared-memory programming model  It allows scopes to overlap and vertices to read/write from/to their scopes

19 Consistency Models in GraphLab  GraphLab guarantees sequential consistency  Provides the same result as a sequential execution of the computational steps  User-defined consistency models  Full Consistency  Vertex Consistency  Edge Consistency 19 Vertex v

20 Consistency Models in GraphLab D1D1 D2D2 D3D3 D4D4 D5D5 D 1↔2 D 2↔3 D 3↔4 D 4↔5 12345 D1D1 D2D2 D3D3 D4D4 D5D5 D 1↔2 D 2↔3 D 3↔4 D 4↔5 12345 Read Write Read Write D1D1 D2D2 D3D3 D4D4 D5D5 D 1↔2 D 2↔3 D 3↔4 D 4↔5 12345 Read Write Full Consistency Model Edge Consistency Model Vertex Consistency Model

21 The GraphLab Analytics Engine 21 GraphLab Motivation & Definition The Programming Model Input, Output & Components The Architectural Model Fault- Tolerance The Computation Model

22  GraphLab employs an asynchronous computation model  It suggests two asynchronous engines  Chromatic Engine  Locking Engine  The chromatic engine executes vertices partially asynchronous  It applies vertex coloring (e.g., no adjacent vertices share the same color)  All vertices with the same color are executed before proceeding to a different color  The locking engine executes vertices fully asynchronously  Data on vertices and edges are susceptible to corruption  It applies a permission-based distributed mutual exclusion mechanism to avoid read-write and write-write hazards

23 The GraphLab Analytics Engine 23 GraphLab Motivation & Definition The Programming Model Input, Output & Components The Architectural Model Fault- Tolerance The Computation Model

24 Fault-Tolerance in GraphLab  GraphLab uses distributed checkpointing to recover from machine failures  It suggests two checkpointing mechanisms  Synchronous checkpointing (it suspends the entire execution of GraphLab)  Asynchronous checkpointing

25 How Does GraphLab Compare to MapReduce and Pregel? 25

26 GraphLab vs. Pregel vs. MapReduce AspectHadoop MapReduce PregelGraphLab Programming Model Shared-MemoryMessage-PassingShared-Memory AspectHadoop MapReduce PregelGraphLab Programming Model Shared-MemoryMessage-PassingShared-Memory Computation Model Synchronous Asynchronous AspectHadoop MapReduce PregelGraphLab Programming Model Shared-MemoryMessage-PassingShared-Memory Computation Model Synchronous Asynchronous Parallelism Model Data-ParallelGraph-Parallel AspectHadoop MapReduce PregelGraphLab Programming Model Shared-MemoryMessage-PassingShared-Memory Computation Model Synchronous Asynchronous Parallelism Model Data-ParallelGraph-Parallel Architectural Model Master-Slave Peer-to-Peer AspectHadoop MapReduce PregelGraphLab Programming Model Shared-MemoryMessage-PassingShared-Memory Computation Model Synchronous Asynchronous Parallelism Model Data-ParallelGraph-Parallel Architectural Model Master-Slave Peer-to-Peer Task/Vertex Scheduling Model Pull-BasedPush-Based AspectHadoop MapReduce PregelGraphLab Programming Model Shared-MemoryMessage-PassingShared-Memory Computation Model Synchronous Asynchronous Parallelism Model Data-ParallelGraph-Parallel Architectural Model Master-Slave Peer-to-Peer Task/Vertex Scheduling Model Pull-BasedPush-Based Application Suitability Loosely- Connected/Embarra ssingly Parallel Applications Strongly-Connected Applications Strongly-Connected Applications (more precisely MLDM apps)

27 Today…  Replication and Consistency  Motivation  Overview  Types of Consistency Models 27 A New Chapter

28 Why Replication? Replication is the process of maintaining the data at multiple computers Replication is necessary for: 1.Improving performance A client can access the replicated copy of the data that is near to its location 2.Increasing the availability of services Replication can mask failures such as server crashes and network disconnection 3.Enhancing the scalability of the system Requests to the data can be distributed to many servers which contain replicated copies of the data 4.Securing against malicious attacks Even if some replicas are malicious, secure data can be guaranteed to the client by relying on the replicated copies at the non-compromised servers 28

29 1. Replication for Improving Performance Example Applications Caching webpages at the client browser Caching IP addresses at clients and DNS Name Servers Caching in Content Delivery Network (CDNs) Commonly accessed contents, such as software and streaming media, are cached at various network locations 29 Main Server Replicated Servers

30 2. Replication for High-Availability Availability can be increased by storing the data at replicated locations (instead of storing one copy of the data at a server) Example: Google File-System replicates the data at computers across different racks, clusters and data-centers If one computer or a rack or a cluster crashes, then the data can still be accessed from another source 30

31 3. Replication for Enhancing Scalability Distributing the data across replicated servers helps in avoiding bottlenecks at the main server It balances the load between the main and the replicated servers Example: Content Delivery Networks decrease the load on main servers of the website 31 Main Server Replicated Servers

32 15637042 4. Replication for Securing Against Malicious Attacks If a minority of the servers that hold the data are malicious, the non- malicious servers can outvote the malicious servers, thus providing security The technique can also be used to provide fault-tolerance against non-malicious but faulty servers Example: In a peer-to-peer system, peers can coordinate to prevent delivering faulty data to the requester 32 n = Servers with correct data n = Servers with faulty data n = Servers that do not have the requested data Number of servers with correct data outvote the faulty servers

33 Why Consistency? In a DS with replicated data, one of the main problems is keeping the data consistent An example: In an e-commerce application, the bank database has been replicated across two servers Maintaining consistency of replicated data is a challenge 33 Bal=1000 Replicated Database Event 1 = Add $1000 Event 2 = Add interest of 5% Bal=2000 1 2 Bal=1050 3 Bal=2050 4 Bal=2100

34 Overview of Consistency and Replication Consistency Models Data-Centric Consistency Models Client-Centric Consistency Models Replica Management When, where and by whom replicas should be placed? Which consistency model to use for keeping replicas consistent? Consistency Protocols We study various implementations of consistency models 34 Next lectures Today’s lecture

35 Overview Consistency Models Data-Centric Consistency Models Client-Centric Consistency Models Replica Management Consistency Protocols 35

36 Introduction to Consistency and Replication In a distributed system, shared data is typically stored in distributed shared memory, distributed databases or distributed file systems The storage can be distributed across multiple computers Simply, we refer to a series of such data storage units as data-stores Multiple processes can access shared data by accessing any replica on the data-store Processes generally perform read and write operations on the replicas Process 1 Process 2 Process 3 Local Copy Distributed data-store

37 Maintaining Consistency of Replicated Data 37 x=0 Replica 1Replica 2Replica 3Replica n Process 1 Process 2 Process 3 R(x)b =Read variable x; Result is b W(x)b = Write variable x; Result is b P1 =Process P1=Timeline at P1 R(x)0 W(x)2 x=2 R(x)?R(x)2 W(x)5 R(x)?R(x)5 x=5 DATA-STORE Strict Consistency Data is always fresh After a write operation, the update is propagated to all the replicas A read operation will result in reading the most recent write If there are occasional writes and reads, this leads to large overheads

38 Maintaining Consistency of Replicated Data (Cont’d) 38 x=0 Replica 1Replica 2Replica 3Replica n Process 1 Process 2 Process 3 R(x)b =Read variable x; Result is b W(x)b = Write variable x; Result is b P1 =Process P1=Timeline at P1 R(x)0 R(x)5 W(x)2 x=2 R(x)?R(x)3 W(x)5 R(x)?R(x)5 x=0x=5x=3 DATA-STORE Loose Consistency Data might be stale A read operation may result in reading a value that was written long back Replicas are generally out-of-sync The replicas may sync at coarse grained time, thus reducing the overhead

39 Trade-offs in Maintaining Consistency Maintaining consistency should balance between the strictness of consistency versus efficiency Good-enough consistency depends on your application 39 Strict Consistency Generally hard to implement, and is inefficient Loose Consistency Easier to implement, and is efficient

40 Consistency Model A consistency model is a contract between the process that wants to use the data, and the replicated data repository (or data-store) A consistency model states the level of consistency provided by the data-store to the processes while reading and writing the data 40

41 Types of Consistency Models Consistency models can be divided into two types: Data-Centric Consistency Models These models define how the data updates are propagated across the replicas to keep them consistent Client-Centric Consistency Models These models assume that clients connect to different replicas at different times The models ensure that whenever a client connects to a replica, the replica is brought up to date with the replica that the client accessed previously 41

42 Summary Replication is necessary for improving performance, scalability availability, and security Replicated data-stores should be designed after carefully evaluating the trade-offs between tolerable data inconsistency and efficiency Consistency Models describe the contract between the data- store and processes about what form of consistency to expect from the system Consistency models can be classified into two types: Data-Centric Consistency models Client-Centric Consistency models 42

43 Next Three Classes Data-Centric Consistency Models Sequential and Causal Consistency Models Client-Centric Consistency Models Eventual Consistency, Monotonic Reads, Monotonic Writes, Read Your Writes and Writes Follow Reads Replica Management Replica management studies: when, where and by whom replicas should be placed which consistency model to use for keeping replicas consistent Consistency Protocols We study various implementations of consistency models 43

44 References [1] Haifeng Yu and Amin Vahdat, “Design and evaluation of a conit-based continuous consistency model for replicated services” [2] http://tech.amikelive.com/node-285/using-content-delivery-networks-cdn-to-speed-up-content- load-on-the-web/ [3] http://en.wikipedia.org/wiki/Replication_(computer_science) [4] http://en.wikipedia.org/wiki/Content_delivery_network [5] http://www.cdk5.net [6] http://www.dis.uniroma1.it/~baldoni/ordered%2520communication%25202008.ppt [7] http://www.cs.uiuc.edu/class/fa09/cs425/L5tmp.ppt 44

45 Back-up Slides 45

46 PageRank  PageRank is a link analysis algorithm  The rank value indicates an importance of a particular web page  A hyperlink to a page counts as a vote of support  A page that is linked to by many pages with high PageRank receives a high rank itself  A PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0.5 PageRank

47 PageRank (Cont’d)  Iterate:  Where:  α is the random reset probability  L[j] is the number of links on page j 132 46 5 5

48 pagerank(i, scope){ // Get Neighborhood data (R[i], W ij, R[j])  scope; // Update the vertex data // Reschedule Neighbors if needed if R[i] changes then reschedule_neighbors_of(i); } PageRank Example in GraphLab  PageRank algorithm is defined as a per-vertex operation working on the scope of the vertex Dynamic computation


Download ppt "Distributed Systems CS 15-440 Programming Models- Part V Replication and Consistency- Part I Lecture 18, Oct 29, 2014 Mohammad Hammoud 1."

Similar presentations


Ads by Google