QUORUMS By gil ben-zvi. definition Assume a universe U of servers, sized n. A quorum system S is a set of subsets of U, every pair of which intersect,

Slides:



Advertisements
Similar presentations
1 Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create (
Advertisements

Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
Sub Exponential Randomize Algorithm for Linear Programming Paper by: Bernd Gärtner and Emo Welzl Presentation by : Oz Lavee.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Copyright 2004 Koren & Krishna ECE655/DataRepl.1 Fall 2006 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing.
Computational Methods for Management and Economics Carla Gomes Module 8b The transportation simplex method.
The Byzantine Generals Problem Leslie Lamport, Robert Shostak, Marshall Pease Distributed Algorithms A1 Presented by: Anna Bendersky.
Prepared by Ilya Kolchinsky.  n generals, communicating through messengers  some of the generals (up to m) might be traitors  all loyal generals should.
Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
TRUST Spring Conference, April 2-3, 2008 Write Markers for Probabilistic Quorum Systems Michael Merideth, Carnegie Mellon University Michael Reiter, University.
Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.
Byzantine Generals Problem: Solution using signed messages.
Bundling Equilibrium in Combinatorial Auctions Written by: Presented by: Ron Holzman Rica Gonen Noa Kfir-Dahav Dov Monderer Moshe Tennenholtz.
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
Performance/Reliability of Disk Systems So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
The Byzantine Generals Problem (M. Pease, R. Shostak, and L. Lamport) January 2011 Presentation by Avishay Tal.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
CS 582 / CMPE 481 Distributed Systems
1 Principles of Reliable Distributed Systems Lecture 12: Disk Paxos and Quorum Systems Spring 2009 Idit Keidar.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Approximation Algorithms
Group Strategyproofness and No Subsidy via LP-Duality By Kamal Jain and Vijay V. Vazirani.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Distributed Combinatorial Optimization
The Byzantine Generals Strike Again Danny Dolev. Introduction We’ll build on the LSP presentation. Prove a necessary and sufficient condition on the network.
Orthogonality and Least Squares
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
Called as the Interval Scheduling Problem. A simpler version of a class of scheduling problems. – Can add weights. – Can add multiple resources – Can ask.
1. The Simplex Method.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
Linear Programming An Example. Problem The dairy "Fior di Latte" produces two types of cheese: cheese A and B. The dairy company must decide how many.
Linear Programming Data Structures and Algorithms A.G. Malamos References: Algorithms, 2006, S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani Introduction.
Orthogonality and Least Squares
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Data Communications and Networking Chapter 11 Routing in Switched Networks References: Book Chapters 12.1, 12.3 Data and Computer Communications, 8th edition.
Greedy Algorithms and Matroids Andreas Klappenecker.
Copyright © Zeph Grunschlag, Induction Zeph Grunschlag.
Chapter 31 INTRODUCTION TO ALGEBRAIC CODING THEORY.
Re-Configurable Byzantine Quorum System Lei Kong S. Arun Mustaque Ahamad Doug Blough.
Introduction to Real Analysis Dr. Weihu Hong Clayton State University 8/19/2008.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
Flow in Network. Graph, oriented graph, network A graph G =(V, E) is specified by a non empty set of nodes V and a set of edges E such that each edge.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Heuristics for Efficient SAT Solving As implemented in GRASP, Chaff and GSAT.
Distributed Storage Systems: Data Replication using Quorums.
A Torus Quorum Protocol for Distributed Mutual Exclusion A Torus Quorum Protocol for Distributed Mutual Exclusion S.D. Lang and L.J. Mao School of Computer.
Copyright © Zeph Grunschlag, Induction Zeph Grunschlag.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
Fault Tolerance (2). Topics r Reliable Group Communication.
Linear Programming Chap 2. The Geometry of LP  In the text, polyhedron is defined as P = { x  R n : Ax  b }. So some of our earlier results should.
Approximation Algorithms based on linear programming.
Network Topology Single-level Diversity Coding System (DCS) An information source is encoded by a number of encoders. There are a number of decoders, each.
Coordination and Agreement
Alternating Bit Protocol
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Outline Announcements Fault Tolerance.
Mutual Exclusion CS p0 CS p1 p2 CS CS p3.
I.4 Polyhedral Theory (NW)
Compact routing schemes with improved stretch
I.4 Polyhedral Theory.
Optimal defence of single object with imperfect false targets
Presentation transcript:

QUORUMS By gil ben-zvi

definition Assume a universe U of servers, sized n. A quorum system S is a set of subsets of U, every pair of which intersect, each Q belongs to S is called a quorum.

EXAMPLES Weighted majorities: assume that every server s in the universe U is assigned a number of votes w(s). Then weighted majorities is a quorum set defined by

EXAMPLES MAJORITIES : a weighted majorities quorum system when all weights are the same. Singleton: a weighted majorities quorum system when for one server s: w(s)=1, and for each v of the other servers w(v)=0. (only quorum is s)

EXAMPLES Grid: suppose n is a square of some integer k. arrange the universe in a k x k grid. A quorum is the union of a full row and one element from each row below. FPP: suppose a projective plane over a field sized q. each point is an element, and each line is a quorum. By projective plane attributes, each quorum intersect.

More definitions Coterie: a coterie S is a quorum system such that for any Q1,Q2 quorums in S: Q1 isn’t included in Q2 Domination: coterie S1 dominates coterie S2 if for every quorum Q2 belongs to S2, there exist Q1 in S1, such that S1 is contained in S2. Strategy: a probability vector representing the probability to access each quorum.

measures Load: the load L(S) of a quorum system is the minimal access probability minimized over the strategies. Resilience: resilience is k, if k is the largest number such that for every k server crashes, one quorum remains unhit.

measures Failure probability: if every server has certain probability to crash (assuming independently here), the probability that each quorum is hit. Usually assuming each server has the same crash probability p.

Measures examples Singleton: load=1, resilience=0, failure probability=p Majorities: load is about ½. Resilience about (n-1)/2. failure probability (if p < ½) smaller than exp(e,-n). Grid: load is O(1/sqrt(n)). Resilience = sqrt(n)-1, failure probability tends to 1 as n grows.

Access protocol Implements the semantics of a multi-writer multi-reader atomic variable. Assumes all clients and servers are non byzantine, unique timestamp for a client Write: a client asks some quorum to obtain a set of value/timestamps pairs, then he writes his value with higher timestamp than each of the timestamps received to each server in the quorum.

Access protocol Read: a client asks for each server in some quorum to obtain a set of value/timestamp. The client chooses the pair with the highest timestamp. It writes back the pair to each server in some quorum Server S updates a pair of value/timestamp, only if the timestamp is greater than the timestamp currently in S

Byzantine quorum systems We will use access protocol to demonstrate the subject Assuming communication is reliable, clients are correct, servers can be byzantine, assuming that a non-empty set of subsets of U: BAD, is known, some B in BAD contains all the faulty servers.

Masking quorum systems A quorum system S is a masking quorum system for a fail-prone system BAD if the following properties are satisfied:

Access protocol write: remains the same Read: for a client to read the variable x, it queries servers for some quorum Q to obtain a set of value/timestamp pairs

Access protocol The client chooses the pair with the highest timestamp in C, or null if C is empty.

Access protocol Claim: a read operation that is concurrent with no write operations return the value written by the last preceding write operation in some serialization of all preceding write operations. Claim: there exists a masking quorum system for BAD iff is a masking quorum system for BAD

Access protocol Criterion: there exists a masking quorum system for BAD iff for all

F-masking quorum systems F-masking quorum system: A masking quorum system where BAD is the set of all groups of servers sized f. By previous claims: There exists a masking quorum system for BAD iff n>4f Each pair of quorums must intersect by at least 2f+1 elements.

examples For f-masking quorums:

Dissemination quorum systems Assumes clients can digitally sign the value/timestamp they propagate. Therefore weaker demands than masking A quorum system S is a dissemination quorum system for a fail-prone system BAD if the following properties are satisfied:

Dissemination quorum systems The same way as masking we reach the (different) criterion: There exists a dissemination quorum system for BAD iff If no more than f servers can fail, but any set of f servers can fail, then must hold: n>3f

Opaque masking quorum systems Motivation: We want not to expose the fail- prone system BAD. done by majority decision. properties for quorum system to become opaque masking system:

Opaque masking quorum systems Read: the modification is that the client choose the pair that appears most often, if there are multiple such sets, it chooses the newest one. Claim: Suppose maximum f servers can fail, there exists an opaque quorum system for BAD iff n>=5f, sufficient because quorums sized [(2n+2f)/3] is an opaque quorum system for B.

Opaque masking quorum system Claim: The load of any opaque system is at least ½. Proof: if we sum up the load of a certain quorum, we’ll get it’s bigger than it’s size/2. the claim follows. Example: hadamard matrix, world of size exp(2,l)

Faulty clients Solves the problem that a client will try to fail the protocol. The treatment here provides a single- writer multi-reader semantics. The write operation starts when the 1 st server receives update request, and ends when the last server sent acknowledgment.

Faulty clients Write: for a client c to write the value v, it chooses legal timestamp, larger than any timestamp it has chosen before, chooses a quorum Q, And then it sends to each server in Q, if after some timeout period it has not received acknowledgment, than it chooses another quorum.

Faulty clients-servers protocol The servers protocol is as follows: 1.if a server receives from a client c, with legal timestamp, then it sends to each member of Q. 2.If a server receives identical echo messages from every server in Q, then it sends to each member of Q.

Faulty clients-servers protocol 3. If a server receives identical ready messages from a set of servers that certainly doesn’t contain faulty server, it sends to Q. 4. If a server receives identical ready messages from a set Q1 of servers, such that Q1=Q\B for some B in BAD, it sends acknowledgment for c, and update the pair if t is greater than the timestamp it currently has.

Faulty servers-properties Agreement: if a correct server delivers and a correct server delivers then r=v Proof: if a correct server delivers, then echo must have been send by all correct servers in Q1. same about Q2, they intersect in a correct server, which doesn’t send different value with the same timestamp

Faulty servers-properties Claim: Read received last written value if it’s not concurrent with write operations. Proof: same as masking quorum system. Propagation: similar ideas to r.b, and byzantine agreement, if server decides to deliver, it is promised that all other decides that too. Validity: at the end a correct quorum will be accessed, so the write can end.

Load, capacity, availability Load: we will mark L(S), definition as before availability: failure probability with the same “p” for all the servers, we will mark it as Fp(S) Capacity: we’ll define a(S,k) as the maximum number of quorum accesses that S can handle during a period of k time units. Capp(S) is the limit of a(S,k)/k as k tends to infinity.

Load, capacity, availability Example: majorities The claim is that cap(S)=1/L(S), and there is a trade off between good availability and good load.

definitions The cardinality of the smallest quorum is denoted by c(S) The degree of an element i in a quorum system S is the number of quorums that contain i Let S be a quorum system. S is a s- uniform if |Q| = s for each Q in S S is (s,d) fair if it is s-uniform and deg(i)=d foreach i, it is called s-fair if it is (s,d) fair for some d.

LP We can use a linear programming to calculate the load and the strategy achieving the load.

DUAL LP Some time we want to use the dual linear program, in which we give probabilities over the elements of the world. It is a known fact that DLP<=LP

The load with failures A configuration is a vector in which it holds 1 in places representing the failing elements in the world Dead(x) is the group of elements failed, live(x) is the non failed ones S(x) is the sub collection of functioning quorums

Load with failiures The load of quorum system S over a configuration x, if S(x) is empty then L(S(x)) = 1, if there are functioning quorums we define it in similar way as before by linear programming problem. Let the elements fail with probabilities P=(p1,………,pn). Then the load is a random variable Lp(S) defined by:

Load with fails Claim: E(Lp(S))>=Fp(S) Claim: If (configurations) x>=z than L(S(x))>=L(S(z)) Proof: S(z) contains S(x), strategy for S(x) is for S(z) too. Claim: E(Lp(S)) is a non decreasing function.

Properties of the load Claim: L(S)>=c(S)/n Claim: L(S)>=1/c(S) Proof: if we choose probability 1/c(S) for every element in c(S) and 0 in the rest, we achieve possible solution for the DLP problem. Conclusion: L(S)>1/sqrt(n) (achieved when c(S) is close to sqrt(n)

Load/fail probability trade off Claim: Fp(S)>=exp(p,n*L(S)) Proof: the probability that all the elements in the smallest quorum will fail, (and therefore the quorum system fails) = exp(p,c(S)). Since c(S)<=nL(S) the claim follows.

examples Optimal load, optimal load/ failure tradeoff, good failure load – paths system B-grid system SC-grid system AndOr system

Load analyses Claim: Non dominated coteries have lower bounds. The claim follows if you choose strategy for the dominator by giving the probability only in quorums which contained by a quorum in the dominated quorum system Claim: voting systems have high load (more than ½)

Last slide!!!! Proof: if we define V=the sum of all votes (Vi), then the vector Yi=Vi/V is a solution for DLP larger than ½.