Presentation is loading. Please wait.

Presentation is loading. Please wait.

April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer.

Similar presentations


Presentation on theme: "April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer."— Presentation transcript:

1 April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer

2 April 2008 2 Edward (Eddie) BortnikovMaxim (Max) GurevichIdit Keidar Gabriel (Gabi) Kliot Alexander (Alex) Shraer

3 April 2008 3 Why Random Node Sampling Gossip partners Random choices make gossip protocols work Unstructured overlay networks E.g., among super-peers Random links provide robustness, expansion Gathering statistics Probe random nodes Choosing cache locations

4 April 2008 4 The Setting Many nodes – n 10,000s, 100,000s, 1,000,000s, … Come and go Churn Full network Like the Internet Every joining node knows some others (Initial) Connectivity

5 April 2008 5 Adversary Attacks Faulty nodes (portion f of ids) Attack other nodes Byzantine failures May want to bias samples Isolate nodes, DoS nodes Promote themselves, bias statistics

6 April 2008 6 Previous Work Benign gossip membership Small (logarithmic) views Robust to churn and benign failures Empirical study [Lpbcast,Scamp,Cyclon,PSS] Analytical study [Allavena et al.] Never proven uniform samples Spatial correlation among neighbors’ views [PSS] Byzantine-resilient gossip Full views [MMR,MS,Fireflies,Drum,BAR] Small views, some resilience [SPSS] We are not aware of any analytical work

7 April 2008 7 Our Contributions 1. Gossip-based attack-tolerant membership Linear portion f of failures O(n 1/3 )-size partial views Correct nodes remain connected Mathematically analyzed, validated in simulations 2. Random sampling Novel memory-efficient approach Converges to proven independent uniform samples The view is not all bad Better than benign gossip

8 April 2008 8 Brahms 1. Sampling - local component 2. Gossip - distributed component sample Sampler Gossip view

9 April 2008 9 Sampler Building Block Input: data stream, one element at a time Bias: some values appear more than others Used with stream of gossiped ids Output: uniform random sample of unique elements seen thus far Independent of other Samplers One element at a time (converging) next sample Sampler

10 April 2008 10 Sampler Implementation Memory: stores one element at a time Use random hash function h From min-wise independent family [Broder et al.] For each set X, and all, Sampler next sample init Keep id with smallest hash so far Choose random hash function

11 April 2008 11 Component S: Sampling and Validation Sampler sample Sampler next init using pings id stream from gossip Validator S

12 April 2008 12 Gossip Process Provides the stream of ids for S Needs to ensure connectivity Use a bag of tricks to overcome attacks

13 April 2008 13 Gossip-Based Membership Primer Small (sub-linear) local view V V constantly changes - essential due to churn Typically, evolves in (unsynchronized) rounds Push: send my id to some node in V Reinforce underrepresented nodes Pull: retrieve view from some node in V Spread knowledge within the network [ Allavena et al. ‘05]: both are essential Low probability for partitions and star topologies

14 April 2008 14 Brahms Gossip Rounds Each round: Send pushes, pulls to random nodes from V Wait to receive pulls, pushes Update S with all received ids (Sometimes) re-compute V Tricky! Beware of adversary attacks

15 April 2008 15 Problem 1: Push Drowning Push Alice Push Bob Push Dana Push Carol Push Ed Push Mallory Push M&M Push Malfoy A D B E M M M M M

16 April 2008 16 Trick 1: Rate-Limit Pushes Use limited messages to bound faulty pushes system-wide Assume at most p of pushes are faulty E.g., computational puzzles/virtual currency Faulty nodes can send portion p of them Views won’t be all bad

17 April 2008 17 Problem 2: Quick Isolation Push Alice Push Bob Push Dana Push Carol Push Ed Push Mallory Push M&M Push Malfoy A C E M Ha! She’s out! Now let’s move on to the next guy! D

18 April 2008 18 Trick 2: Detection & Recovery Do not re-compute V in rounds when too many pushes are received Push Mallory Push M&M Push Malfoy Hey! I’m swamped! I better ignore all of ‘em pushes… Push Bob Slows down isolation; does not prevent it Isolation takes log (view size) time

19 April 2008 19 Problem 3: Pull Deterioration Pull M8M8 ABM1M1 M2M2 CDM3M3 M4M4 M7M7 EFM5M5 M6M6 E M3M3 M3M3 EM7M7 M8M8 50% faulty ids in views  75% faulty ids in views

20 April 2008 20 Trick 3: Balance Pulls & Pushes Control contribution of push - α|V| ids versus contribution of pull - β|V| ids Parameters α, β Pull-only  eventually all faulty ids Push-only  quick isolation of attacked node Push ensures: system-wide not all bad ids Pull slows down (does not prevent) isolation

21 April 2008 21 Trick 4: History Samples Attacker influences both push and pull Feedback γ|V| random ids from S Parameters α + β + γ = 1 Attacker loses control - samples are eventually perfectly uniform Yoo-hoo, is there any good process out there?

22 April 2008 22 View and Sample Maintenance Pushed ids Pulled ids S  |V| |V||V| |V||V| View VSample

23 April 2008 23 Samples Take Time To Help Judicious use essential (e.g., 10%) Deal with churn, bootstrap Need connectivity for uniform sampling Rely on sampling for connectivity? Break cycle: Assume attack starts when samples are empty Analyze time to partition without samples Samples become useful at least as fast

24 April 2008 24 Key Property With appropriate parameters E.g., Time to partition > time to convergence of 1 st good sample Prove lower bound using tricks 1,2,3 (not using samples yet) Prove upper bound until some good sample persists forever Self-healing from partitions

25 Time to Partition Analysis Easiest partition to cause – isolate one node Targeted attack Analysis of targeted attack: Assume unrealistically strong adversary Analyze evolution of faulty ids in views as a random process Show lower bound on time to isolation Depends on p, |V|, α, β Independent of n April 2008 25

26 April 2008 26 Scalability of PSP(t) – Probability of Perfect Sample at Time t Analysis says: For scalability, want small and constant convergence time independent of system size, e.g., when

27 April 2008 27 Analysis 1. Sampling - mathematical analysis 2. Connectivity - analysis and simulation 3. Full system simulation

28 April 2008 28 Connectivity  Sampling Theorem: If overlay remains connected indefinitely, samples are eventually uniform

29 April 2008 29 Sampling  Connectivity Ever After Perfect sample of a sampler with hash h: the id with the lowest h(id) system-wide If correct, sticks once the sampler sees it Correct perfect sample  self-healing from partitions ever after

30 April 2008 30 Convergence to 1 st Perfect Sample  n = 1000  f = 0.2  40% unique ids in stream

31 April 2008 31 Connectivity Analysis 1: Balanced Attacks Attack all nodes the same Maximizes faulty ids in views system-wide in any single round If repeated, system converges to fixed point ratio of faulty ids in views, which is < 1 if γ=0 (no history) and p < 1/3 or History samples are used, any p There are always good ids in views!

32 April 2008 32 Fixed Point Analysis: Push i Local view node 1 Local view node i Time t: push 1 Time t+1: push from faulty node lost push x(t) – portion of faulty nodes in views at round t; portion of faulty pushes to correct nodes : p / ( p + ( 1 − p )( 1 − x(t) ) )

33 April 2008 33 Fixed Point Analysis: Pull i Local view node 1 Local view node i Time t: pull from i: faulty with probability x(t) Time t+1: E[x(t+1)] =  p / (p + (1 − p)(1 − x(t))) +  ( x(t) + (1-x(t))  x(t) ) + γf pull from faulty

34 April 2008 34 Faulty Ids in Fixed Point With a few history samples, any portion of bad nodes can be tolerated Perfectly validated fixed points and convergence Assumed perfect in analysis, real history in simulations

35 April 2008 35 Convergence to Fixed Point  n = 1000  p = 0.2  α=β=0.5  γ=0

36 April 2008 36 Connectivity Analysis 2: Targeted Attack – Roadmap Step 1: analysis without history samples Isolation in logarithmic time … but not too fast, thanks to tricks 1,2,3 Step 2: analysis of history sample convergence Time-to-perfect-sample < Time-to-Isolation Step 3: putting it all together Empirical evaluation No isolation happens

37 April 2008 37 Targeted Attack – Step 1 Q: How fast (lower bound) can an attacker isolate one node from the rest? Worst-case assumptions No use of history samples (  = 0) Unrealistically strong adversary Observes the exact number of correct pushes and complements it to α|V| Attacked node not represented initially Balanced attack on the rest of the system

38 April 2008 38 Isolation w/out History Samples  n = 1000  p = 0.2  α=β=0.5  γ=0 Depend on α,β,p Isolation time for |V|=60

39 April 2008 39 Step 2: Sample Convergence Perfect sample in 2-3 rounds  n = 1000  p = 0.2  α=β=0.5, γ=0  40% unique ids Empirically verified

40 April 2008 40 Step 3: Putting It All Together No Isolation with History Samples Works well despite small PSP  n = 1000  p = 0.2  α=β=0.45  γ=0.1

41 April 2008 41   p = 0.2  α=β=0.45  γ=0.1 Sample Convergence (Balanced) Convergence twice as fast with

42 April 2008 42 Summary O(n 1/3 )-size views Resist attacks / failures of linear portion Converge to proven uniform samples Precise analysis of impact of attacks

43 April 2008 43 Balanced Attack Analysis (1) Assume (roughly) equal initial node degrees x(t) = portion of faulty ids in correct node views at time t Compute E[x(t+1)] as function of x(t), p, , ,  Result #1: Short-term Optimality Any non-balanced schedule imposes a smaller x(t) in a single round

44 April 2008 44 Balanced Attack Analysis (2) Result #2: Existence of Fixed Point X^ E[x(t+1)] = x(t) = X^ Analyze X^ (function of p, , ,  ) Conditions for uniqueness For  =  =0.5, p < 1/3, exists X^ < 1 The view is not entirely poisoned – history samples are not essential Result #3: Convergence to fixed point From any initial portion < 1 of faulty ids From [ Hillam 1975 ] (sequence convergence)


Download ppt "April 2008 1 Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer."

Similar presentations


Ads by Google