Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tolerating Faults in Counting Networks Marc D. Riedel Jehoshua Bruck California Institute of Technology Parallel and Distributed.

Similar presentations


Presentation on theme: "Tolerating Faults in Counting Networks Marc D. Riedel Jehoshua Bruck California Institute of Technology Parallel and Distributed."— Presentation transcript:

1

2 Tolerating Faults in Counting Networks http://www.paradise.caltech.edu Marc D. Riedel Jehoshua Bruck California Institute of Technology Parallel and Distributed Computing Group

3 Multiprocessor Coordination scheduling Shared Counting Processes cooperate to assign successive values 602 606 605 601 603 604 607 608 609 610 load balancing resource allocation

4 Multiprocessor Coordination Centralized Solution serialized access 602 601 603 604 608 600601602603604605606

5 Multiprocessor Coordination Centralized Solution high contention Disadvantages: 602 601 603 604 608 low throughput

6 0 00 00 0 Counting Networks Data structure for multiprocessor coordination Aspnes, Herlihy & Shavit (1991) concurrent data structure

7 0 00 00 0 Counting Networks Data structure for multiprocessor coordination Aspnes, Herlihy & Shavit (1991) 1 11 concurrent data structure

8 0 00 00 0 Counting Networks Data structure for multiprocessor coordination Aspnes, Herlihy & Shavit (1991) 0 0 1 0 00 0 1111 1 concurrent data structure change this to 601 with eq. editor

9 Counting Networks Data structure for multiprocessor coordination Aspnes, Herlihy & Shavit (1991) Concurrent access by up to n processes Each process accesses 1/n-th of bits 0 00 00 0 0 0 1 0 00 01 111 1

10 Counting Networks Data structure for multiprocessor coordination Aspnes, Herlihy & Shavit (1991) 0 00 00 0 0 0 1 0 00 01 111 1 low contention Advantages: high throughput

11 Balancer Asynchronous token routing device inputsoutputs 1 bit of memory

12 inputsoutputs 1 bit of memory Balancer Asynchronous token routing device

13 inputsoutputs 1 bit of memory Balancer Asynchronous token routing device

14 inputsoutputs 1 bit of memory Balancer Asynchronous token routing device

15 inputsoutputs 1 bit of memory Balancer Asynchronous token routing device

16 inputsoutputs 1 bit of memory Balancer Asynchronous token routing device

17 inputsoutputs 1 bit of memory Balancer Asynchronous token routing device

18 inputsoutputs 1 bit of memory Balancer Asynchronous token routing device

19 inputsoutputs 1 bit of memory Balancer Asynchronous token routing device

20 inputsoutputs 1 bit of memory Balancer Asynchronous token routing device

21 inputsoutputs 1 bit of memory balanced token counts Balancer Asynchronous token routing device

22 Shared Memory Architectures Balancer : shared boolean variable. Type balancer begin state: boolean; top: ptr to balancer; bottom: ptr to balancer; end state top bottom 1 Processes shepherd tokens through the network. 01

23 b e a aaa b bbcc cc d dee eddfg f g ff g g Counting Network Data structure for multiprocessor coordination Aspnes, Herlihy & Shavit (1991) depth outputs inputs

24 b e a aaa b bbcc cc d dee eddfg f g ff g g step sequence Counting Network Isomorphic to Batcher’s Bitonic sorting network.

25 Snapshot inputsoutputs 1 bit of memory x y Balancer

26 3 1 3 0 1 2 2 2 2 1 2 2 2 2 1 2 Execution trace: token counts on all wires Counting Network

27 concurrent data structure 01 00 Fault Tolerance 0 No lost tokens No errors in control:Dynamic faults in the data structure: Corrupted data Inaccessible data No errors in network wiring

28 inputsoutputs Fault Model

29 inputsoutputs Fault Model fault!

30 inputsoutputs Fault Model state is inaccessible

31 inputsoutputs Fault Model state is inaccessible tokens bypass balancer

32 inputsoutputs Fault Model state is inaccessible tokens bypass balancer

33 inputsoutputs Fault Model state is inaccessible tokens bypass balancer

34 inputsoutputs Fault Model imbalance in token counts state is inaccessible tokens bypass balancer

35 inputsoutputs Fault Model received prior to the fault received after the fault tokens bypass balancer

36 Fault Tolerance Naïve approach: replicate every balancer. outputsinputs

37 Fault Tolerance inputsoutputs Naïve approach: replicate every balancer.

38 Fault Tolerance inputsoutputs Naïve approach: replicate every balancer.

39 Fault Tolerance inputsoutputs Naïve approach: replicate every balancer.

40 Fault Tolerance inputsoutputs Naïve approach: replicate every balancer.

41 Fault Tolerance inputsoutputs Naïve approach: replicate every balancer. fault!

42 Fault Tolerance inputsoutputs Naïve approach: replicate every balancer.

43 Fault Tolerance inputsoutputs Naïve approach: replicate every balancer.

44 Fault Tolerance inputsoutputs Naïve approach: replicate every balancer. imbalance in token counts Doesn’t work!

45 Fault-Tolerant Balancer inputsoutputs LFF k+1 “pseudo-balancers”, tolerates k faults two bits of memory each

46 Pseudo-Balancer inputsoutputs L two bits of memory state: up or down status: leader (L) or follower (F)

47 Fault Tolerance 1st Solution: Counting Network constructed with FT balancers. Counting Network FT Counting Network tolerates k faults

48 Fault Tolerance FT balancers Correction Network Counting Network 2nd Solution: Rectify errors with a correction network. remapped faulty balancers (better provided that

49 Remapping Faulty Balancers

50 fault Remapping Faulty Balancers

51 inaccessible balancer Remapping Faulty Balancers

52 inaccessible balancer spare balancer, random initial state Redirect pointers to spare balancer Remapping Faulty Balancers

53 inputsoutputs Fault Model

54 inputsoutputs Fault Model fault!

55 inputsoutputs Fault Model spurious state transition Remapped balancer

56 inputsoutputs Fault Model spurious state transition Remapped balancer

57 inputsoutputs Fault Model imbalance in token counts spurious state transition Remapped balancer

58 inputsoutputs Fault Model x y Remapped balancer

59 Error Bound Error bound for the output sequence of a balancing network with remapped balancers: Balancing Network k faults

60 Distance Measure The distance between two sequences and is: Definition: gives number of “misplaced tokens” Balancing Network k faults

61 Two identical balancing networks, given same inputs: Error Bound k faults no faults

62 3 1 3 0 1 2 2 2 Execution without faults: 2 1 2 2 2 2 1 2 Error Bound

63 3 1 3 0 1 2 2 2 2 1 2 2 2 2 1 2 3 1 3 0 1 2 2 2 2 1 1 3 2 1 1 3 Execution with a fault: Error Bound

64 2 2 1 2 2 1 1 3 Distance: = 1 = 0 = 1 = 0 Error Bound

65 Correction Network Strategy: Construct a block which reduces error by one. step sequence with k errors step sequence with errors CORRECT[ n ]

66 Correction Network BUTTERFLY[ n ] largest value smallest value step sequence with k errors step sequence with errors To reduce error by one: balance smallest and largest entries.

67 Butterfly Network Network which separates out smallest and largest entries: 0 1 10 1 0 1 34 0 1 0 6 5 1 0 17 4 3 3 2 9 9 9 8 7 6 6 5 6 6 6 5 largest value smallest value

68 Butterfly Network Balance smallest and largest entries: 0 1 10 1 0 1 34 0 1 0 6 5 1 0 17 4 3 3 2 9 9 9 8 7 6 6 5 6 6 6 5 6 6 6 5 6 6 6 6 error reduced

69 Correction Network step sequence with k errors Strategy: to correct k faults, append k copies. CORRECT[ n ] #k CORRECT[ n ] #1#1 smooth sequence step sequence

70 Fault Tolerance FT balancers Correction Network Counting Network remapped faulty balancers Correction network, constructed with FT balancers, is appended to counting network.

71 Conclusions Upper bound on error resulting from faults. Practical method for tolerating faults with extra stages. Future Work Extend concepts to Diffracting Trees (Shavit et al., 1996) and other constructs. General framework for fault-tolerant concurrent data structures.

72 Leader incoming tokens colored green Accepts tokens on either wire. inputsoutputs L two bits of memory Colors outgoing tokens red.

73 Leader incoming tokens colored green Accepts tokens on either wire. inputsoutputs L two bits of memory Colors outgoing tokens red.

74 Leader incoming tokens colored green Accepts tokens on either wire. inputsoutputs L two bits of memory Colors outgoing tokens red.

75 Leader incoming tokens colored green Accepts tokens on either wire. inputsoutputs L two bits of memory Colors outgoing tokens red.

76 Leader incoming tokens colored green Accepts tokens on either wire. inputsoutputs L two bits of memory Colors outgoing tokens red.

77 Follower Accepts red tokens in order. inputsoutputs F two bits of memory

78 Follower Accepts red tokens in order. inputsoutputs F two bits of memory

79 Follower Accepts red tokens in order. inputsoutputs F two bits of memory

80 Follower Accepts red tokens in order. inputsoutputs F two bits of memory

81 Follower Accepts red tokens in order. inputsoutputs F two bits of memory

82 Follower Accepts red tokens in order. inputsoutputs F two bits of memory

83 Follower Accepts red tokens in order. inputsoutputs F two bits of memory

84 Follower Accepts red tokens in order. inputsoutputs F two bits of memory Becomes a leader if it receives a green token.

85 Follower Accepts red tokens in order. inputsoutputs F two bits of memory Becomes a leader if it receives a green token. L

86 Follower Accepts red tokens in order. inputsoutputs F two bits of memory Becomes a leader if it receives a green token. L

87 Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers

88 Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers

89 Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers

90 Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers

91 Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers

92 Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers

93 Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers

94 Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers

95 Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers

96 Fault-Tolerant Balancer inputsoutputs LFF k+1 pseudo-balancers

97 Fault-Tolerant Balancer inputsoutputs ?FF k+1 pseudo-balancers

98 Fault-Tolerant Balancer inputsoutputs ?FF k+1 pseudo-balancers

99 Fault-Tolerant Balancer inputsoutputs ?FF k+1 pseudo-balancers

100 Fault-Tolerant Balancer inputsoutputs ?FF k+1 pseudo-balancers L

101 Fault-Tolerant Balancer inputsoutputs ?FF k+1 pseudo-balancers L

102 Fault-Tolerant Balancer inputsoutputs ?FF k+1 pseudo-balancers L


Download ppt "Tolerating Faults in Counting Networks Marc D. Riedel Jehoshua Bruck California Institute of Technology Parallel and Distributed."

Similar presentations


Ads by Google