Parallel and Distributed Algorithms

Slides:



Advertisements
Similar presentations
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Advertisements

Distributed Leader Election Algorithms in Synchronous Ring Networks
CS 542: Topics in Distributed Systems Diganta Goswami.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Lecture 8: Asynchronous Network Algorithms
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
Chapter 15 Basic Asynchronous Network Algorithms
Distributed Computing 2. Leader Election – ring network Shmuel Zaks ©
Lecture 7: Synchronous Network Algorithms
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
PRAM (Parallel Random Access Machine)
Efficient Parallel Algorithms COMP308
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Leader Election in Rings
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
Composition Model and its code. bound:=bound+1.
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 10, 2005 Session 9.
1 Leader Election in Rings. 2 A Ring Network Sense of direction left right.
Leader Election. Leader Election: the idea We study Leader Election in rings.
Lecture #14 Distributed Algorithms (II) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 3: Leader Election in Rings 1.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Distributed Leader Election Krishnendu Mukhopadhyaya Indian Statistical Institute, Kolkata.
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSE 486/586 Distributed Systems Leader Election
Parallel Algorithms (chap. 30, 1st edition)
Lecture 2: Parallel computational models
Lecture 17: Leader Election
Distributed Algorithms (22903)
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
PRAM Algorithms.
Lecture 9: Asynchronous Network Algorithms
PRAM Model for Parallel Computation
Leader Election (if we ignore the failure detection part)
Alternating Bit Protocol
Distributed Consensus
Distributed Systems, Consensus and Replicated State Machines
Distributed Consensus
EEC 688/788 Secure and Dependable Computing
CSE 486/586 Distributed Systems Leader Election
Consensus in Synchronous Systems: Byzantine Generals Problem
Consensus, FLP, and Paxos
Unit –VIII PRAM Algorithms.
EEC 688/788 Secure and Dependable Computing
Lecture 10: Coordination and Agreement
Lecture 8: Synchronous Network Algorithms
Distributed Algorithms (22903)
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Leader Election Ch. 3, 4.1, 15.1, 15.2 Chien-Liang Fok 4/29/2019
Lecture 11: Coordination and Agreement
CSE 486/586 Distributed Systems Leader Election
Presentation transcript:

Parallel and Distributed Algorithms Amihood Amir

Parallel Random Access Machine The PRAM Model Parallel Random Access Machine Processor Processor Processor Processor Memory Processor Processor Processor Processor Processor Processor

The PRAM Model Shared-memory multiprocessor unlimited number of processors, each: has unlimited local memory knows its ID able to access the shared memory in constant time unlimited shared memory 1 P1 2 3 P2 . . . Pi . . Pn m

The PRAM Model Why do we need it? General theoretical framework of parallel algorithms. Model for proving complexity results. Analyse maximum parallelism.

The PRAM Model Complexity Complexity Measures For Processors: Time: number of instructions executed. Space: number of memory cells used. For the PRAM machine: Time: from start to end of operation. Number of Processors.

The PRAM Model Complexity Work: PRAM time x Number of Processors Optimal Speedup: When the work equals the optimum time complexity of a serial algorithm.

The PRAM Model Complexity Example: 1. Sorting n numbers in time O(1) using n2 Processors. Work: O(n2). Not optimal speedup. 2. Sorting n numbers in time O(log n) using n Processors. Work: O(n log n). Optimal speedup.

Processor Activation Assume a central processor activates necessary processors in constant time. May be implemented by the same program in all processors. Common central clock.

Shared Memory Access Concurrent (C) means, many processors can do the operation simultaneously in the same memory Exclusive (E) not concurent EREW (Exclusive Read Exclusive Write) CREW (Concurrent Read Exclusive Write) Many processors can read simultaneously the same location, but only one can attempt to write to a given location ERCW (Exclusive Read Concurrent Write) CRCW (Concurrent Read Concurrent Write) Many processors can write/read at/from the same memory location

Concurrent Write What value gets written? Priority CW – Based either on processor priority (e.g. higher number) or value priority (e.g. larger value) Common CW – multiple processors can simultaneously write only if values are the same Arbitrary/Random CW – any one of the values are randomly chosen

Concurrent Write - Comparison Priority ≥ Arbitrary ≥ Common ≥ CREW ≥ EREW Most Least Powerful Poweful Least Most Realistic Realistic

Example Problem: “OR” Input: n bits. Output: The inclusive or of all bits. Serial Algorithm: If any of the bits is “1”, output “1”, otherwise output “0”. Time: O(n).

OPTIMAL SPEEDUP Example CRCW priority Algorithm: When many processors write simultaneously, the maximum number is written. Algorithm: Write your number in location “output” Correctness: If any of the bits is “1” it is the maximum. Thus it gets written and is the output. Time: O(1). Work: O(n). OPTIMAL SPEEDUP

Example EREW Algorithm: Idea: Compare all pairs and write down maximum. Now half the input size. Keep going until only one value left.

Example 0 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1

Example 0 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 1

Example 0 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 1 1 1 0 1

Example 0 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 1 1 0 1 1 1

Example 0 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 1 1 0 0 1 1 1

Example NOT OPTIMAL SPEEDUP EREW Algorithm for processor Pj In time i (i goes from 1 to log n): If there is a non negative k such that j=k2i+1 then write in memory location M[j] the result of M[j] inclusive or M[j+2i-1] Time: O(log n). Work: O(n log n). NOT OPTIMAL SPEEDUP

Local Memory – processors have local connections The Distributed Model Local Memory – processors have local connections Processor Processor Processor Processor Processor Processor Processor Processor Processor Processor

The Internet

Safety and Liveness Safety property : “nothing bad happens” holds in every finite execution prefix if one general attacks, both do a program never terminates with a wrong answer Liveness property: “something good eventually happens” no partial execution is irremediable both generals eventually attack a program eventually terminates Admissible executions satisfy safety and liveness properties for a particular system type.

Communication: Asynchronous vs. Synchronous no upper bound on message delivery time no centralized clock no bound on relative speed of processes Synchronous: communication in rounds In a round a process: delivers all pending messages takes an execution step (which may involve sending one or more messages) if no failures, every message sent is eventually delivered

Uniform Algorithms Number of processors not known. Algorithm can not use information about the number of processors.

Anonymous Algorithms Processors have no I.D. Formally, they are identical automata. A processor can distinguish between its different ports.

Complexity TIME Message Complexity each processor’s state set includes terminated states termination: all processors in terminated states no messages in transit Message Complexity Count maximum total number of messages Synchronous: number of rounds till termination. Asynchronous: Set some unit of time as maximum message delay.

Leader Election Example 1: A Bank maintains multiple servers in their cloud, but for each customer, one of the servers is responsible, i.e., is the leader What if there are two leaders per customer? Inconsistency What if servers disagree about who the leader is? What if the leader crashes? Unavailability

Leader Election Example 2: A group of cloud servers replicating a file need to elect one among them as the primary replica that will communicate with the client machines Example 3: Group of NTP servers: who is the root server?

Leader Election We consider leader election on a bidirectional ring.

Leader Election Two types of termination state: In every admissible execution exactly one process enters an elected state – the leader. All others are in a non-elected state. elected non-elected

Impossibility Result Theorem: There is no deterministic leader election algorithm for an anonymous bidirectional ring, even in a synchronous and non-uniform system. Proof: Let S be a system of n>1 processors on a ring.

Theorem Proof Lemma: At the end of each round of execution of S, every processor in S is at the same state. Proof: By induction on the number of rounds k. Base Case: k=0. Clear. They all start at the same state. Inductive hypothesis: At the end of round k, all processors are in the same state. Need to prove that they are at the same state at the end of round k+1.

Theorem Proof (cont.) Because of inductive hypothesis processors sent the same messages the end of round k. Thus, they all receive the same message from left and right. They apply the same transition functions (since they are identical). Thus they are at the same state at the end of round k+1. Conclude: If one processor enters a “leader state” they all do.

LCR Algorithm LeLann (1977) Chang-Roberts(1979) Not anonymous: every process has a unique ID. Asynchronous Unidirectional Uniform

LCR Algorithm (cont.) 3: upon receiving m from right 4: case 5: m.uid > uidi : 6: send m to left 7: m.uid < uidi : 8: discard m 9: m.uid = uidi : 10: leader := i 11: send <terminate, i> to left 12: terminate endcase 13: upon receiving <terminate, i> from right neighbor 14: leader := i 15: send <terminate, i> to left 16: terminate 1: upon receiving no message 2: send uidi to left (clockwise)

613 17 365 26 18 248

613 613 17 365 17 365 26 18 26 248 18 248

613 17 613 365 26 248 18 248

613 17 613 17 365 26 18 613 248 18 248

613 17 613 613 365 26 248 613 18 248

613 17 613 613 365 26 613 613 248 18 248

613 17 613 613 613 365 26 248 613 613 18 248

613 613 17 <terminate,613> 613 365 26 613 613 613 18 248

613 613 613 17 613 <terminate,613> 365 26 613 613 613 18 248

613 613 613 17 613 613 365 26 613 <terminate,613> 613 18 248

613 613 613 17 613 613 365 26 613 613 <terminate,613> 18 248

613 613 613 17 613 613 365 26 <terminate,613> 613 613 18 248

613 613 613 17 <terminate,613> 613 613 365 26 613 613 18 248

613 613 613 17 613 613 365 26 613 613 18 248

LCR Algorithm Correctness Highest processor ID always moves on Therefore it completes cycle and highest ID gets chosen leader (liveness) No other ID can complete cycle – so no incorrect leader is chosen (safety)

Complexity Can we do better? Time complexity: Message complexity: O(n) This bound is tight… 1 2 n-1 n-2 Can we do better?

HS Algorithm Hirschenberg and Sinclair (1980) Ring is bidirectional Each process pi operates in phases In each phase l, pi sends out “tokens” containing uidi in both directions Tokens are intended to travel distance 2l and return to pi Token continues outbound only if greater than tokens on path Otherwise discarded All processes always forward tokens moving inbound However, tokens may not make it back If pi receives its own token while it is going outbound, pi is the leader Phase 1 Phase 2 Phase 1 Phase 0 Phase 0 Phase 2 Phase 0 Phase 2 Phase 1

HS algorithm (cont.) Correctness: As in LCR algorithm. Time: O(n) Message complexity: O(n log n)