Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb 20161.

Similar presentations


Presentation on theme: "Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb 20161."— Presentation transcript:

1 Fault tolerance and related issues in distributed computing Shmuel Zaks zaks@cs.technion.ac.il GSSI - Feb 20161

2 2 Haifa

3 GSSI - Feb 20163 CS, Technion

4 Part 0: Part 0: Distributed computing – an overview: basic notions; seminar focus: from lower bounds, via impossibility, to fault tolerance and self-stabilization. Part 1: Part 1: Lower bounds Part 2: Part 2: Computing in spite of faults - impossibility of consensus Part 3: Part 3: Detecting faults - the snapshot algorithm Part 4: Self-stabilization - Self recovery from faults GSSI - Feb 20164

5 Part 0: Part 0: An overview Part 1: Part 1: Lower bounds Part 2: Part 2: Computing in spite of faults Part 3: Part 3: Detecting faults Part 4: Part 4: Self-stabilization GSSI - Feb 20165

6 processors communication problem Communication network GSSI - Feb 20166 A. The model

7 Anonymous GSSI - Feb 20167

8 12 a e 6 c Unique identities GSSI - Feb 20168

9 d a e b c message passing communication lines, channels topology communication GSSI - Feb 20169

10 ab c d e directed, undirected (message passing) GSSI - Feb 201610

11  message delivery mechanism  fifo  reliable, no faults  finite, arbitrary delay  queues of messages (message passing) GSSI - Feb 201611

12 Distributed algorithm, protocol  Send a message  receive a message  do local computation GSSI - Feb 201612 (message passing) Execution

13 R4R4 GSSI - Feb 201613 R1R1 R2R2 R3R3 R5R5 e a b d c shared memory

14 7 2 31 25 9 88 40 9 A B read/write (shared memory) GSSI - Feb 201614

15 synchronization Synchronous, Asynchronous d a e b c GSSI - Feb 201615

16 Asynchronous Model GSSI - Feb 201616 ij time t+???time t Clock Network (synchronization)

17 Synchronous Model GSSI - Feb 201617 ij time t+dtime t Clock Network (synchronization)

18 Asynchronous Model - many executions GSSI - Feb 201618 Synchronous Model - unique execution rounds (synchronization)

19 Asynchronou s GSSI - Feb 201619 Synchronous Shared memory Message passing (synchronization)

20 GSSI - Feb 201620 Asynchronous model: for correctness, for upper bound analysis Synchronous model: for lower bound analysis

21 Topology GSSI - Feb 201621 Ring d a e b c

22 Clique d a e b c (Topology) GSSI - Feb 201622

23 7 2 31 25 9 88 40 12 4 General (Topology) GSSI - Feb 201623

24 Why simple networks? They enable the understanding of many design issues In existing general networks – assume a virtual simple network implemented (e.g. a ring) (Topology) GSSI - Feb 201624

25 Complexity measures GSSI - Feb 201625  Synchronous system  time  Asynchronous system  communication  communication (messages, bits)  time (synchronous time, longest chain, bounded delay)

26 Parallel vs. Distributed computing Parallel computing – given a problem … (ex: sorting) Distributed computing – Given a network … (ex: broadcast) GSSI - Feb 201626

27 (Parallel vs. Distributed computing( Parallel computing : time vs. number of processors Distributed computing: number of messages Complexity goals: Parallel computing: efficiency Distributed computing: correctness GSSI - Feb 201627

28 problem, task P1P1 P2P2 P3P3 input output 3 7 5 Leader election yes no consensus 1 0 0 1 1 1 GSSI - Feb 201628 b. Problems

29 issues GSSI - Feb 201629  design and analysis of algorithms  impossibility, lower bounds  fault tolerance

30 problems GSSI - Feb 201630  broadcast  snapshot  consensus  shortest path, maximal flow  leader election, breaking symmetry, maximum finding, spanning tree, center  termination  deadlock

31 Example: broadcast GSSI - Feb 201631 d a e b c f

32 Broadcast: bfs (breadth-first-search) GSSI - Feb 201632 d a e b c f

33 Broadcast: dfs (depth-first-search) GSSI - Feb 201633 d a e b c f

34 message complexity  each edge carries exactly one message at each direction  message complexity is 2|E| GSSI - Feb 201634

35 time complexity GSSI - Feb 201635 synchronous time 2|E| longest chain 2|E| bounded delay 2|E|

36 pi (propogation of information), shout-echo GSSI - Feb 201636 d a e b c f

37 Algorithm pi ( p ropogation of i nformation) send m to each neighbour stop GSSI - Feb 201637 if receive m along edge e: send m on all edges except e stop

38 pi Theorem: The following holds for every execution of the pi algorithm: A processor receives the message m at most once. The execution terminates. each processor receives the message m. The edges on which processors receive m form a spanning tree. The message complexity is 2|E|-|V|+1. The time complexity … GSSI - Feb 201638

39 pif (propogation of information with feedback) shout-echo GSSI - Feb 201639 d a e b c f

40 Distributed algorithms “Positive” results: design, analysis, upper bounds “Negative” results: lower bounds, impossibility GSSI - Feb 201640 c. In this seminar

41 P1P1 P2P2 P3P3 input output 3 7 5 Leader election yes no GSSI - Feb 201641 Part 1: Part 1: Lower bounds

42 GSSI - Feb 201642 message passing asynchronous 9 4 5 8 6 ? x x x x Leader election

43 9 4 5 8 6 GSSI - Feb 201643 We’ll see: a lower bound of Ω (n log n) messages

44 GSSI - Feb 201644 d a e b c f Lower bound and fault tolerance Usually all processors need to compute some function Lower bound of Ω(|E|) g

45 problem, task P1P1 P2P2 P3P3 input output consensus 1 0 0 1 1 1 GSSI - Feb 201645 Part 2: Part 2: Computing in spite of faults

46 message passing asynchronous 0 1 1 0 1 Consensus GSSI - Feb 201646 We’ll see: impossibility to reach consensus.

47 GSSI - Feb 201647 Snapshot Part 3: Part 3: Detecting faults

48 GSSI - Feb 201648 We’ll see: snapshot algorithm.

49 GSSI - Feb 201649 Example: clock synchronization 66 6 6 6 7 7 7 7 7 Part 4: Self-stabilization

50 GSSI - Feb 201650 67 6 4 6

51 4 Let’s try … 6 6 6 7 GSSI - Feb 201651

52 4 But … 6 6 6 7 GSSI - Feb 201652

53 GSSI - Feb 201653 67 6 4 6 We’ll see: self stabilizing algorithms, proofs and performance analysis.


Download ppt "Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb 20161."

Similar presentations


Ads by Google