Presentation is loading. Please wait.

Presentation is loading. Please wait.

April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

Similar presentations

Presentation on theme: "April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer."— Presentation transcript:

1 April Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer

2 April Edward (Eddie) BortnikovMaxim (Max) GurevichIdit Keidar Gabriel (Gabi) Kliot Alexander (Alex) Shraer

3 April Why Random Node Sampling Gossip partners Random choices make gossip protocols work Unstructured overlay networks E.g., among super-peers Random links provide robustness, expansion Gathering statistics Probe random nodes Choosing cache locations

4 April The Setting Many nodes – n 10,000s, 100,000s, 1,000,000s, … Come and go Churn Every joining node knows some others Connectivity Full network Like the Internet Byzantine failures

5 April Adversary Attacks Faulty nodes (portion f of ids) Attack other nodes May want to bias samples Isolate nodes, DoS nodes Promote themselves, bias statistics

6 April Previous Work Benign gossip membership Small (logarithmic) views Robust to churn and benign failures Empirical study [Lpbcast,Scamp,Cyclon,PSS] Analytical study [Allavena et al.] Never proven uniform samples Spatial correlation among neighbors’ views [PSS] Byzantine-resilient gossip Full views [MMR,MS,Fireflies,Drum,BAR] Small views, some resilience [SPSS] We are not aware of any analytical work

7 April Our Contributions 1. Gossip-based attack-tolerant membership Linear portion f of failures O(n 1/3 )-size partial views Correct nodes remain connected Mathematically analyzed, validated in simulations 2. Random sampling Novel memory-efficient approach Converges to proven independent uniform samples The view is not all bad Better than benign gossip

8 April Brahms 1. Sampling - local component 2. Gossip - distributed component sample Sampler Gossip view

9 April Sampler Building Block Input: data stream, one element at a time Bias: some values appear more than others Used with stream of gossiped ids Output: uniform random sample of unique elements seen thus far Independent of other Samplers One element at a time (converging) next sample Sampler

10 April Sampler Implementation Memory: stores one element at a time Use random hash function h From min-wise independent family [Broder et al.] For each set X, and all, Sampler next sample init Keep id with smallest hash so far Choose random hash function

11 April Component S: Sampling and Validation Sampler sample Sampler next init using pings id stream from gossip Validator S

12 April Gossip Process Provides the stream of ids for S Needs to ensure connectivity Use a bag of tricks to overcome attacks

13 April Gossip-Based Membership Primer Small (sub-linear) local view V V constantly changes - essential due to churn Typically, evolves in (unsynchronized) rounds Push: send my id to some node in V Reinforce underrepresented nodes Pull: retrieve view from some node in V Spread knowledge within the network [ Allavena et al. ‘05]: both are essential Low probability for partitions and star topologies

14 April Brahms Gossip Rounds Each round: Send pushes, pulls to random nodes from V Wait to receive pulls, pushes Update S with all received ids (Sometimes) re-compute V Tricky! Beware of adversary attacks

15 April Problem 1: Push Drowning Push Alice Push Bob Push Dana Push Carol Push Ed Push Mallory Push M&M Push Malfoy A D B E M M M M M

16 April Trick 1: Rate-Limit Pushes Use limited messages to bound faulty pushes system-wide E.g., computational puzzles/virtual currency Faulty nodes can send portion p of them Views won’t be all bad

17 April Problem 2: Quick Isolation Push Alice Push Bob Push Dana Push Carol Push Ed Push Mallory Push M&M Push Malfoy A C E M Ha! She’s out! Now let’s move on to the next guy! D

18 April Trick 2: Detection & Recovery Do not re-compute V in rounds when too many pushes are received Slows down isolation; does not prevent it Push Mallory Push M&M Push Malfoy Hey! I’m swamped! I better ignore all of ‘em pushes… Push Bob

19 April Problem 3: Pull Deterioration Pull M8M8 ABM1M1 M2M2 CDM3M3 M4M4 M7M7 EFM5M5 M6M6 E M3M3 M3M3 EM7M7 M8M8 50% faulty ids in views  75% faulty ids in views

20 April Trick 3: Balance Pulls & Pushes Control contribution of push - α|V| ids versus contribution of pull - β|V| ids Parameters α, β Pull-only  eventually all faulty ids Push-only  quick isolation of attacked node Push ensures: system-wide not all bad ids Pull slows down (does not prevent) isolation

21 April Trick 4: History Samples Attacker influences both push and pull Feedback γ|V| random ids from S Parameters α + β + γ = 1 Attacker loses control - samples are eventually perfectly uniform Yoo-hoo, is there any good process out there?

22 April View and Sample Maintenance Pushed ids Pulled ids S  |V| |V||V| |V||V| View VSample

23 April Key Property Samples take time to help Assume attack starts when samples are empty With appropriate parameters E.g., Time to isolation > time to convergence Prove lower bound using tricks 1,2,3 (not using samples yet) Prove upper bound until some good sample persists forever Self-healing from partitions

24 April History Samples: Rationale Judicious use essential Bootstrap, avoid slow convergence Deal with churn With a little bit of history samples (10%) we can cope with any adversary Amplification!

25 April Analysis 1. Sampling - mathematical analysis 2. Connectivity - analysis and simulation 3. Full system simulation

26 April Connectivity  Sampling Theorem: If overlay remains connected indefinitely, samples are eventually uniform

27 April Sampling  Connectivity Ever After Perfect sample of a sampler with hash h: the id with the lowest h(id) system-wide If correct, sticks once the sampler sees it Correct perfect sample  self-healing from partitions ever after We analyze PSP(t) – probability of perfect sample at time t

28 April Convergence to 1 st Perfect Sample  n = 1000  f = 0.2  40% unique ids in stream

29 April Scalability Analysis says: For scalability, want small and constant convergence time independent of system size, e.g., when

30 April Connectivity Analysis 1: Balanced Attacks Attack all nodes the same Maximizes faulty ids in views system-wide in any single round If repeated, system converges to fixed point ratio of faulty ids in views, which is < 1 if γ=0 (no history) and p < 1/3 or History samples are used, any p There are always good ids in views!

31 April Fixed Point Analysis: Push i Local view node 1 Local view node i Time t: push 1 Time t+1: push from faulty node lost push x(t) – portion of faulty nodes in views at round t; portion of faulty pushes to correct nodes : p / ( p + ( 1 − p )( 1 − x(t) ) )

32 April Fixed Point Analysis: Pull i Local view node 1 Local view node i Time t: pull from i: faulty with probability x(t) Time t+1: E[x(t+1)] =  p / (p + (1 − p)(1 − x(t))) +  ( x(t) + (1-x(t))  x(t) ) + γf pull from faulty

33 April Faulty Ids in Fixed Point With a few history samples, any portion of bad nodes can be tolerated Perfectly validated fixed points and convergence Assumed perfect in analysis, real history in simulations

34 April Convergence to Fixed Point  n = 1000  p = 0.2  α=β=0.5  γ=0

35 April Connectivity Analysis 2: Targeted Attack – Roadmap Step 1: analysis without history samples Isolation in logarithmic time … but not too fast, thanks to tricks 1,2,3 Step 2: analysis of history sample convergence Time-to-perfect-sample < Time-to-Isolation Step 3: putting it all together Empirical evaluation No isolation happens

36 April Targeted Attack – Step 1 Q: How fast (lower bound) can an attacker isolate one node from the rest? Worst-case assumptions No use of history samples (  = 0) Unrealistically strong adversary Observes the exact number of correct pushes and complements it to α|V| Attacked node not represented initially Balanced attack on the rest of the system

37 April Isolation w/out History Samples  n = 1000  p = 0.2  α=β=0.5  γ=0 Depend on α,β,p Isolation time for |V|=60

38 April Step 2: Sample Convergence Perfect sample in 2-3 rounds  n = 1000  p = 0.2  α=β=0.5, γ=0  40% unique ids Empirically verified

39 April Step 3: Putting It All Together No Isolation with History Samples Works well despite small PSP  n = 1000  p = 0.2  α=β=0.45  γ=0.1

40 April   p = 0.2  α=β=0.45  γ=0.1 Sample Convergence (Balanced) Convergence twice as fast with

41 April Summary O(n 1/3 )-size views Resist attacks / failures of linear portion Converge to proven uniform samples Precise analysis of impact of attacks

42 April Balanced Attack Analysis (1) Assume (roughly) equal initial node degrees x(t) = portion of faulty ids in correct node views at time t Compute E[x(t+1)] as function of x(t), p, , ,  Result #1: Short-term Optimality Any non-balanced schedule imposes a smaller x(t) in a single round

43 April Balanced Attack Analysis (2) Result #2: Existence of Fixed Point X^ E[x(t+1)] = x(t) = X^ Analyze X^ (function of p, , ,  ) Conditions for uniqueness For  =  =0.5, p < 1/3, exists X^ < 1 The view is not entirely poisoned – history samples are not essential Result #3: Convergence to fixed point From any initial portion < 1 of faulty ids From [ Hillam 1975 ] (sequence convergence)

Download ppt "April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer."

Similar presentations

Ads by Google