Presentation is loading. Please wait.

Presentation is loading. Please wait.

April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

Similar presentations


Presentation on theme: "April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer."— Presentation transcript:

1 April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer

2 April 2008 2 Edward (Eddie) BortnikovMaxim (Max) GurevichIdit Keidar Gabriel (Gabi) Kliot Alexander (Alex) Shraer

3 April 2008 3 Why Random Node Sampling Gossip partners Random choices make gossip protocols work Unstructured overlay networks E.g., among super-peers Random links provide robustness, expansion Gathering statistics Probe random nodes Choosing cache locations

4 April 2008 4 The Setting Many nodes – n 10,000s, 100,000s, 1,000,000s, … Come and go Churn Every joining node knows some others Connectivity Full network Like the Internet Byzantine failures

5 April 2008 5 Adversary Attacks Faulty nodes (portion f of ids) Attack other nodes May want to bias samples Isolate nodes, DoS nodes Promote themselves, bias statistics

6 April 2008 6 Previous Work Benign gossip membership Small (logarithmic) views Robust to churn and benign failures Empirical study [Lpbcast,Scamp,Cyclon,PSS] Analytical study [Allavena et al.] Never proven uniform samples Spatial correlation among neighbors’ views [PSS] Byzantine-resilient gossip Full views [MMR,MS,Fireflies,Drum,BAR] Small views, some resilience [SPSS] We are not aware of any analytical work

7 April 2008 7 Our Contributions 1. Gossip-based attack-tolerant membership Linear portion f of failures O(n 1/3 )-size partial views Correct nodes remain connected Mathematically analyzed, validated in simulations 2. Random sampling Novel memory-efficient approach Converges to proven independent uniform samples The view is not all bad Better than benign gossip

8 April 2008 8 Brahms 1. Sampling - local component 2. Gossip - distributed component sample Sampler Gossip view

9 April 2008 9 Sampler Building Block Input: data stream, one element at a time Bias: some values appear more than others Used with stream of gossiped ids Output: uniform random sample of unique elements seen thus far Independent of other Samplers One element at a time (converging) next sample Sampler

10 April 2008 10 Sampler Implementation Memory: stores one element at a time Use random hash function h From min-wise independent family [Broder et al.] For each set X, and all, Sampler next sample init Keep id with smallest hash so far Choose random hash function

11 April 2008 11 Component S: Sampling and Validation Sampler sample Sampler next init using pings id stream from gossip Validator S

12 April 2008 12 Gossip Process Provides the stream of ids for S Needs to ensure connectivity Use a bag of tricks to overcome attacks

13 April 2008 13 Gossip-Based Membership Primer Small (sub-linear) local view V V constantly changes - essential due to churn Typically, evolves in (unsynchronized) rounds Push: send my id to some node in V Reinforce underrepresented nodes Pull: retrieve view from some node in V Spread knowledge within the network [ Allavena et al. ‘05]: both are essential Low probability for partitions and star topologies

14 April 2008 14 Brahms Gossip Rounds Each round: Send pushes, pulls to random nodes from V Wait to receive pulls, pushes Update S with all received ids (Sometimes) re-compute V Tricky! Beware of adversary attacks

15 April 2008 15 Problem 1: Push Drowning Push Alice Push Bob Push Dana Push Carol Push Ed Push Mallory Push M&M Push Malfoy A D B E M M M M M

16 April 2008 16 Trick 1: Rate-Limit Pushes Use limited messages to bound faulty pushes system-wide E.g., computational puzzles/virtual currency Faulty nodes can send portion p of them Views won’t be all bad

17 April 2008 17 Problem 2: Quick Isolation Push Alice Push Bob Push Dana Push Carol Push Ed Push Mallory Push M&M Push Malfoy A C E M Ha! She’s out! Now let’s move on to the next guy! D

18 April 2008 18 Trick 2: Detection & Recovery Do not re-compute V in rounds when too many pushes are received Slows down isolation; does not prevent it Push Mallory Push M&M Push Malfoy Hey! I’m swamped! I better ignore all of ‘em pushes… Push Bob

19 April 2008 19 Problem 3: Pull Deterioration Pull M8M8 ABM1M1 M2M2 CDM3M3 M4M4 M7M7 EFM5M5 M6M6 E M3M3 M3M3 EM7M7 M8M8 50% faulty ids in views  75% faulty ids in views

20 April 2008 20 Trick 3: Balance Pulls & Pushes Control contribution of push - α|V| ids versus contribution of pull - β|V| ids Parameters α, β Pull-only  eventually all faulty ids Push-only  quick isolation of attacked node Push ensures: system-wide not all bad ids Pull slows down (does not prevent) isolation

21 April 2008 21 Trick 4: History Samples Attacker influences both push and pull Feedback γ|V| random ids from S Parameters α + β + γ = 1 Attacker loses control - samples are eventually perfectly uniform Yoo-hoo, is there any good process out there?

22 April 2008 22 View and Sample Maintenance Pushed ids Pulled ids S  |V| |V||V| |V||V| View VSample

23 April 2008 23 Key Property Samples take time to help Assume attack starts when samples are empty With appropriate parameters E.g., Time to isolation > time to convergence Prove lower bound using tricks 1,2,3 (not using samples yet) Prove upper bound until some good sample persists forever Self-healing from partitions

24 April 2008 24 History Samples: Rationale Judicious use essential Bootstrap, avoid slow convergence Deal with churn With a little bit of history samples (10%) we can cope with any adversary Amplification!

25 April 2008 25 Analysis 1. Sampling - mathematical analysis 2. Connectivity - analysis and simulation 3. Full system simulation

26 April 2008 26 Connectivity  Sampling Theorem: If overlay remains connected indefinitely, samples are eventually uniform

27 April 2008 27 Sampling  Connectivity Ever After Perfect sample of a sampler with hash h: the id with the lowest h(id) system-wide If correct, sticks once the sampler sees it Correct perfect sample  self-healing from partitions ever after We analyze PSP(t) – probability of perfect sample at time t

28 April 2008 28 Convergence to 1 st Perfect Sample  n = 1000  f = 0.2  40% unique ids in stream

29 April 2008 29 Scalability Analysis says: For scalability, want small and constant convergence time independent of system size, e.g., when

30 April 2008 30 Connectivity Analysis 1: Balanced Attacks Attack all nodes the same Maximizes faulty ids in views system-wide in any single round If repeated, system converges to fixed point ratio of faulty ids in views, which is < 1 if γ=0 (no history) and p < 1/3 or History samples are used, any p There are always good ids in views!

31 April 2008 31 Fixed Point Analysis: Push i Local view node 1 Local view node i Time t: push 1 Time t+1: push from faulty node lost push x(t) – portion of faulty nodes in views at round t; portion of faulty pushes to correct nodes : p / ( p + ( 1 − p )( 1 − x(t) ) )

32 April 2008 32 Fixed Point Analysis: Pull i Local view node 1 Local view node i Time t: pull from i: faulty with probability x(t) Time t+1: E[x(t+1)] =  p / (p + (1 − p)(1 − x(t))) +  ( x(t) + (1-x(t))  x(t) ) + γf pull from faulty

33 April 2008 33 Faulty Ids in Fixed Point With a few history samples, any portion of bad nodes can be tolerated Perfectly validated fixed points and convergence Assumed perfect in analysis, real history in simulations

34 April 2008 34 Convergence to Fixed Point  n = 1000  p = 0.2  α=β=0.5  γ=0

35 April 2008 35 Connectivity Analysis 2: Targeted Attack – Roadmap Step 1: analysis without history samples Isolation in logarithmic time … but not too fast, thanks to tricks 1,2,3 Step 2: analysis of history sample convergence Time-to-perfect-sample < Time-to-Isolation Step 3: putting it all together Empirical evaluation No isolation happens

36 April 2008 36 Targeted Attack – Step 1 Q: How fast (lower bound) can an attacker isolate one node from the rest? Worst-case assumptions No use of history samples (  = 0) Unrealistically strong adversary Observes the exact number of correct pushes and complements it to α|V| Attacked node not represented initially Balanced attack on the rest of the system

37 April 2008 37 Isolation w/out History Samples  n = 1000  p = 0.2  α=β=0.5  γ=0 Depend on α,β,p Isolation time for |V|=60

38 April 2008 38 Step 2: Sample Convergence Perfect sample in 2-3 rounds  n = 1000  p = 0.2  α=β=0.5, γ=0  40% unique ids Empirically verified

39 April 2008 39 Step 3: Putting It All Together No Isolation with History Samples Works well despite small PSP  n = 1000  p = 0.2  α=β=0.45  γ=0.1

40 April 2008 40   p = 0.2  α=β=0.45  γ=0.1 Sample Convergence (Balanced) Convergence twice as fast with

41 April 2008 41 Summary O(n 1/3 )-size views Resist attacks / failures of linear portion Converge to proven uniform samples Precise analysis of impact of attacks

42 April 2008 42 Balanced Attack Analysis (1) Assume (roughly) equal initial node degrees x(t) = portion of faulty ids in correct node views at time t Compute E[x(t+1)] as function of x(t), p, , ,  Result #1: Short-term Optimality Any non-balanced schedule imposes a smaller x(t) in a single round

43 April 2008 43 Balanced Attack Analysis (2) Result #2: Existence of Fixed Point X^ E[x(t+1)] = x(t) = X^ Analyze X^ (function of p, , ,  ) Conditions for uniqueness For  =  =0.5, p < 1/3, exists X^ < 1 The view is not entirely poisoned – history samples are not essential Result #3: Convergence to fixed point From any initial portion < 1 of faulty ids From [ Hillam 1975 ] (sequence convergence)


Download ppt "April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer."

Similar presentations


Ads by Google