Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Shared Memory, Related Issues, and New Challenges in Large-Scale Dynamic Systems Vincent Gramoli 1.

Similar presentations


Presentation on theme: "Distributed Shared Memory, Related Issues, and New Challenges in Large-Scale Dynamic Systems Vincent Gramoli 1."— Presentation transcript:

1 Distributed Shared Memory, Related Issues, and New Challenges in Large-Scale Dynamic Systems Vincent Gramoli 1

2 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Roadmap Large-Scale Dynamic Systems  Context and Motivations Distributed Shared Memory  Facing Dynamism  Facing Scalability A Probabilistic Solution  Facing Dynamism and Scalability A New Challenge  Distributed Slicing 2

3 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal The Scale-Shift of Distributed Systems Internet Network growth IPv4 to IPv6 Internet 3

4 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal The Scale-Shift of Distributed Systems Personal devices multiply All tend to be connected together 4

5 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal The Scale-Shift of Distributed Systems Network devices time 17 billions of network devices by 2012 as predicted by IDC 5

6 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Drawback of this Scale-shift Heterogeneity  Each device acts independently  Each device has distinct lifetime Out-of-control  Global monitoring is impossible  The system is unpredictable Dynamism  At any time some participants may leave/fail  And some others may join  Unbounded number of leaves/failures/joins 6

7 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Problem: How to Communicate? Shared-Memory Paradigm  Simple programming style  Appealing design for algorithms MEM P1 7

8 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Problem: How to Communicate? Shared-Memory Paradigm  Simple programming style  Appealing design for algorithms Message-Passing Paradigm  Better suited for delayed messages  Fault-tolerant system MEM P1 LINK 8

9 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Problem: How to Communicate? Shared-Memory Paradigm  Simple programming style  Appealing design for algorithms Message-Passing Paradigm  Better-suited for delayed messages  Fault-tolerant system Emulating Shared-Memory in Message-Passing  Simplicity  Fault-tolerance MEM P1 LINK 9

10 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Shared Memory: the emulation Consistency Criterion, a set of rules Atomic object: If an operation ends before another starts, then it can not be ordered after Write operations are totally ordered and read operations are ordered w.r.t. write operations A read returns the last value written (or the default one if none exist) If objects are atomic, then the system looks like a shared-memory model! 10

11 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Quorum-based DSM [ABD] Quorums  Mutually intersecting sets of nodes Q1 Q2 Q3 Q1 ∩ Q2 ≠ Ø Q1 ∩ Q3 ≠ Ø Q2 ∩ Q3 ≠ Ø Each node maintains:  A local value v of the object  A unique version number t of this value 11

12 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Quorum-based DSM [ABD] Operations  A node reads the object value by Asking the value, tag of all nodes of a quorum Choosing the value with the largest tag Replicating this value to all nodes of a quorum  A node writes a new object value by Asking the tags of all nodes of a quorum Choosing a higher tag than any tag returned Replicating its value with the new tag to a quorum Get Set Get Set t’ = t++ 12

13 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Quorum-based DSM [ABD] Writing a value v1 Q1 Q2 Q3 Input: v1 13

14 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Quorum-based DSM [ABD] Writing a value v1 Q1 Q2 Q3 max tag? t 14

15 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Quorum-based DSM [ABD] Writing a value v1 Q1 Q2 Q3 v1,t1 (with t1 > t) 15

16 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Quorum-based DSM [ABD] Reading a value Q1 Q2 Q3 value? tag? v1,t1 16

17 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Quorum-based DSM [ABD] Reading a value Q1 Q2 Q3 v1,t1 17

18 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Quorum-based DSM [ABD] Reading a value Q1 Q2 Q3 Output: v1 18

19 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Dynamic DSM [RDS] Dynamism => Unbounded number of failures Solution: Reconfiguration  Replacing quorums periodically with quorums of active nodes. Q1 Q2 Q3 Problem: Q1 ∩ Q2 = Ø and Q1 ∩ Q3 = Ø and Q2 ∩ Q3 = Ø 19

20 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Dynamic DSM [RDS] All must agree on the next set of quorums  Quorum-based consensus algorithm: Paxos Reconfiguration must not block operations  Up-to-date information is passed from old quorums to new quorums during reconfiguration  Operations that discover a new quorum set must restart using it. 20

21 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Dynamic DSM [RDS] Algorithm  Reconfiguration is based on Paxos (3 phases leader-based consensus alorithm)  l is the leader  c is the current configuration  configs is the set of active configurations  A ballot has a unique identifier b and a value v, which is a configuration Paxos phases:  Prepare: l creates a new ballot and chooses/gets the value to propose.  Propose: l proposes and gathers votes from a majority.  Propagate: l propagates decision 21

22 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Dynamic DSM [RDS] l Q1Q2 Recon(c,c’) 22

23 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Dynamic DSM [RDS] l Q1Q2 Prepare phase Recon(c,c’) Creates a new larger ballot b 23

24 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Dynamic DSM [RDS] l Q1Q2 Prepare phase Recon(c,c’) 24

25 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Dynamic DSM [RDS] l Q1Q2 > Updates its ballot’s value v with the one received Updates its configs set Prepare phase Recon(c,c’) 25

26 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Dynamic DSM [RDS] l Q1Q2 > Propose phase Recon(c,c’) 26

27 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Dynamic DSM [RDS] l Q1Q2 > Recon(c,c’) Propose phase Updates their tag and val Adds v to their configs set 27

28 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Dynamic DSM [RDS] l Q1Q2 > Recon(c,c’) Propagation phase Update their tag and val Remove configuration c from their configs set 28

29 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Dynamic DSM [RDS] Good News: The overhead latency to cope with dynamism is low 29

30 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Dynamic DSM [RDS] Bad News: In both solutions, congestion may delay the latency 30

31 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal First Conclusion Communication complexity must be reduced to face scalability! 31

32 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable DSM [SQUARE] Object replicated on failure-prone nodes  The replicas r1, …, rk share a 2-dim coordinate space r1r1 r2r2 r3r3 r4r4 r5r5 r6r6 r7r7 r8r8 … … r k-1 rkrk 32

33 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable DSM [SQUARE] Communication through neighborhood  Each replica r i can communicate only with its nearest neighbors riri 33

34 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable DSM [SQUARE] Topology takeover mechanism [CAN]  Upon node failure/departure the space sharing is modified accordingly If a node r i fails, a takeover node r j replaces it riri rjrj 34

35 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable DSM [SQUARE] Dynamic Quorums  Vertical Quorum: All replicas responsible of an abscissa x  Horizontal Quorum: All replicas responsible of an ordinate y x y For any horizontal quorum H and any vertical quorum V: H  V ≠ Ø 35

36 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable DSM [SQUARE] Read Operation: 1)Get up-to-date values and tags of a horizontal quorum, 2) Replicate this value on a vertical quorum. Write Operation: 1)Get up-to-date value on a horizontal quorum, 2) Replicate the value to write (and a higher tag) on the same vertical quorum Operation Execution 36

37 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable DSM [SQUARE] Memory thwarts if the requested replica is overloaded: Other replicas on its diagonal are contacted in turn until a non-overloaded one is found Memory expands if all contacted replicas are overloaded: A node outside the memory is added, and the object value is replicated at this node. Memory shrinks if a replica gets underloaded: The replica simply leaves the memory after neighbors notification. Self-Adjusting Memory 37

38 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable DSM [SQUARE] Good News: The memory self-adapts well in face of dynamism 38

39 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable DSM [SQUARE] Good News: The load is well-balanced over the replicas 39

40 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable DSM [SQUARE] Bad News: The operation latency increases with the load (request rate) 40

41 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Second Conclusion No way to avoid the tradeoff between communication and time complexity! 41

42 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable and Dynamic DSM [TQS] Motivations for Probabilistic Solutions  Tradeoff between time and message complexity prevents deterministic solutions  Allowing more Realistic Models Any node can fail independently Even if it is unlikely that many nodes fail at the same time  Quality of Service (QoS) is often expressed in terms of percentage of success 42

43 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable and Dynamic DSM [TQS] Dynamic System  n interconnected nodes  Nodes join/leave the system  A joining node is new  c is the churn: At each time unit, └ cn ┘ nodes leave the network At each time unit, └ cn ┘ nodes enter the network 43

44 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable and Dynamic DSM [TQS] Probabilistic Atomicity If an operation ends before another starts, then it is ordered after with probability e -β 2 (with β a constant). If this happen, the preceding operation is considered as unsuccessful. Write operations are totally ordered and read operations are ordered w.r.t. write operations A read returns the last successfully value written (or the default one if none exist) with probability 1- e -β 2 (with β a constant). If this does not hold, then the read is unsuccessful. 44

45 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable and Dynamic DSM [TQS] Gossip-based algorithm in parallel  Shuffle set of neighbors using gossip-based algorithm (e.g. Cyclon) Quorum contact -Disseminate message with TTL l to k neighbors, such that #contacted nodes = β  n / (1-c) Δ/2 -Decrements TTL received if first time received. -Forward received messages to k neighbors if their TTL is not null. 45

46 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable and Dynamic DSM [TQS] Read  Ask a quorum of nodes their values  Replicate the most up-to-date value to a quorum Write  Ask a quorum of nodes their tags  Chooses a strictly higher tag  Replicate the value to write with the new tag 46

47 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Scalable and Dynamic DSM [TQS] Assumptions:  success regularity: at least one operation succeeds every Δ time units.  The underlying gossip-based algorithm provides each node with a neighbor chosen uniformly at random. Results:  Expected messages: O(  nD) without shuffling,  Expected latency: O(log  nD),  Where D = (1-c) - Δ is the dynamic parameter. Given the application requirement in terms of QoS, the quorum size can be tuned. 47

48 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Third Conclusion Trading deterministic for probabilistic guarantees seems to be the solution! 48

49 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal What could be the next step? Today in Peer-to-Peer:  Every node must participates  Avoid Free-Riding/Lurking at any cost! However, an old story says:  Gnutella performance was limited by the performance of its lowest capable nodes 49

50 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal What could be the next step? Some classifying solutions are efficient:  Recommendation helps decision (e.g. eBay)  Supernodes/utlrapeers help sharing files (e.g. Kazaa)  Supernodes help NAT/FW by-passing (e.g. Skype) Generally, we can benefit from heterogeneity  Streaming service needs nodes with highest bandwidth  Non-critical service can run on unstable nodes  File-sharing service requires nodes with many files  … 50

51 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Problem Classifying nodes into categories, slices  Based on individual characteristics: attributes  A slice corresponds to a portion of the system Typically, answering the question: 51

52 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] HOW AM I COMPARED TO OTHERS? Classifying nodes into categories, slices  Based on individual characteristics: attributes  A slice corresponds to a portion of the system Typically, answering the question: 52

53 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] 53

54 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] 68 70 8 72 62 75 65 20 71 48 59 89 27 …using their attribute values (assume a single attribute for simplicity reason) 54

55 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] 68 70 8 72 62 75 65 20 71 48 59 89 27 0100 Attribute values ai 55

56 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] 68 70 8 72 62 75 65 20 71 48 59 89 27 0100 0 1 Attribute values ai Normalized Indices pi 56

57 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal #4#3#2#1 Distributed Slicing [RANK] 68 70 8 72 62 75 65 20 71 48 59 89 27 0100 0 1 Normalized Indices pi 0 1 Slices si Attribute values ai 57

58 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] Existing solutions use gossip-based mechanism 58

59 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] 68 70 8 72 62 75 65 20 71 48 59 89 27 4/11 59

60 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] 68 70 8 72 62 75 65 20 71 48 59 89 27 68 89 60

61 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] 68 70 8 72 62 75 65 20 71 48 59 89 27 0/2 61

62 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] 68 70 8 72 62 75 65 20 71 48 59 89 27 72 20 62

63 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] 68 70 8 72 62 75 65 20 71 48 59 89 27 1/4 63

64 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] 68 70 8 72 62 75 65 20 71 48 59 89 27 48 75 64

65 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] 68 70 8 72 62 75 65 20 71 48 59 89 27 1/3 65

66 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Distributed Slicing [RANK] Performance achieved so far:  d, is the distance from pi’ (the position estimate of i) to the closest slice boundary.  For confidence coefficient of 99,99%, the required number of attribute value drawn is: mi ≥ z pi’ (1 – pi’) / d 2, with z <16, a constant. 66

67 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal Fourth Conclusion Distributed Slicing is a new Challenge! 67

68 ICDCS 2007 June Fernandez, Gramoli, Jimenez, Kermarrec, Raynal References [RANK] Distributed Slicing in Dynamic Systems A. Fernandez, V. Gramoli, E. Jimenez, A-M. Kermarrec, and M. Raynal ICDCS 2007 [TQS] Timed Quorum System for Large-Scale Dynamic Environments V. Gramoli and M. Raynal IRISA TR1859 2007 [SQUARE] SQUARE: Scalable Quorum-based Atomic Memory with Local Reconfiguration V. Gramoli, E. Anceaume, and A. Virgilitto ACM SAC 2007 [Cyclon] Cyclon: Inexpensive Membership Management for Unstructured P2P Overlays S. Voulgaris, D. Gavidia, and M. van Steen Journal of Network and System Management 13(2) 2005 [RDS] Reconfigurable Distributed Storage for Dynamic Networks G. Chokler, S. Gilbert, V. Grmoli, P.M. Musial, and A.A. Shvartsman OPODIS 2005 [CAN] A Scalable Content Addressable Network. S. Ratnasamy, P. Francis, M. Handley, R.M. Karp, and S. Shencker ACM SIGCOMM 2001 [ABD] Sharing Memory Robustly in Message-Passing Systems. H. Attiya, A. Bar-Noy, and D. Dolev JACM 1995 68


Download ppt "Distributed Shared Memory, Related Issues, and New Challenges in Large-Scale Dynamic Systems Vincent Gramoli 1."

Similar presentations


Ads by Google