Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interprocess Communication and Synchronization based on Message Passing.

Similar presentations


Presentation on theme: "Interprocess Communication and Synchronization based on Message Passing."— Presentation transcript:

1 Interprocess Communication and Synchronization based on Message Passing

2 Approaches to Parallel Programming Sequential language + library MPI, PVM Extend sequential language C/Linda, Concurrent C++ New languages designed for parallel or distributed programming SR, occam, Ada, Orca

3 Paradigms for Parallel Programming Processes + shared variables Processes + message passing Concurrent object-oriented languages Concurrent functional languages Concurrent logic languages Data-parallelism (SPMD model) Advanced communication models

4 Overview Message passing General issues Examples: rendezvous, Remote Procedure Calls, Broadcast Nondeterminism Select statement Example language: SR (Synchronizing Resources) Traveling Salesman Problem in SR Example library: MPI (Message Passing Interface)

5 Point-to-point Message Passing Basic primitives: send & receive As library routines: send(destination, & MsgBuffer) receive(source, &MsgBuffer) As language constructs send MsgName(arguments) to destination receive MsgName(arguments) from source

6 Issues in Message Passing Naming the sender and receiver Explicit or implicit receipt of messages Synchronous versus asynchronous messages

7 Direct naming Sender and receiver directly name each other S: send M to R R: receive M from S Asymmetric direct naming (more flexible): S: send M to R R: receive M Direct naming is easy to implement Destination of message is know in advance Implementation just maps logical names to machine addresses

8 Indirect naming Indirect naming uses extra indirection level R: send M to P -- P is a port name S: receive M from P Sender and receiver need not know each other Port names can be moved around (e.g., in a message) send ReplyPort(P) to U -- P is name of reply port Most languages allow only a single process at a time to receive from any given port Some languages allow multiple receivers that service messages on demand -> called a mailbox

9 Explicit Message Receipt Explicit receive by an existing process Receiving process only handles message when it is willing to do so process main() { // regular computation here receive M( ….); // explicit message receipt // code to handle message // more regular computations …. }

10 Implicit message receipt Receipt by a new thread of control, created for handling the incoming message int x; process main( ) { // just regular computations, this code can access X } message-handler M( ) // created whenever a message M arrives { // code to handle the message, can also access X }

11 Differences Implicit receipt is used if it’s unknown when a message will arrive (e.g., request for data) Explicit receive gives more control over when to accept which messages; e.g., SR allows: receive ReadFile(file, offset, NrBytes) by NrBytes // sorts messages by (increasing) 3rd parameter, i.e. small reads go first // sorts messages by (increasing) 3rd parameter, i.e. small reads go first

12 Synchronous vs. asynchronous Message Passing Synchronous message passing: Sender is blocked until receiver has accepted the message Too restrictive for many parallel applications Asynchronous message passing: Sender continues immediately More efficient Ordering problems Buffering problems

13 Ordering with asynchronous message passing SENDER: RECEIVER: send message(1)receive message(N); print N send message(2)receive message(M); print M Messages may be received in any order, depending on the protocol Message ordering message(1) message(2)

14 Example: AT&T crash P2P1 Are you still alive? P2P1P1 crashesP1 is dead P2P1 I’m back Regular message Something’s wrong, I’d better crash! P2P1P2 is dead

15 Message buffering Keep messages in a buffer until the receive( ) is done What if the buffer overflows? Continue, but delete some messages (e.g., oldest one), or Use flow control: block the sender temporarily Flow control changes the semantics since it introduces synchronization S: send zillion messages to R; receive messages R: send zillion messages to S; receive messages -> deadlock!

16 Example communication primitives Rendezvous (Ada) Remote Procedure Call (RPC) Broadcast

17 Rendezvous (Ada) Two-way interaction Synchronous (blocking) send Explicit receive Output parameters sent back to caller Entry = procedure implemented by a task that can be called remotely

18 Example task SERVER is entry INCREMENT(X: integer; Y: out integer); end; entry call: S.INCREMENT(2, A) -- invoke entry of task S

19 Accept statement task body SERVER is begin accept INCREMENT(X: integer; Y: out integer) do Y := X + 1; -- handle entry call end; …... end; Entry call is fully synchronous Invoker waits until server is ready to accept Accept statement waits for entry call Caller proceeds after accept statement has been executed

20 Remote Procedure Call (RPC) Similar to traditional procedure call Caller and receiver are different processes Possibly on different machines Fully synchronous Sender waits for RPC to complete Implicit message receipt New thread of control within receiver

21 Broadcast Many networks (e.g., Ethernet) support: broadcast: send message to all machines multicast: send messages to a set of machines Hardware multicast is very efficient Ethernet: same delay as for a unicast Multicast can be made reliable using software protocols

22 Nondeterminism Interactions may depend on run-time conditions e.g.: wait for a message from either A or B, whichever comes first Need to express and control nondeterminism specify when to accept which message Example (bounded buffer): do simultaneously when buffer not full: accept request to store message when buffer not empty: accept request to fetch message

23 Select statement several alternatives of the form: WHEN condition => ACCEPT message DO statement Each alternative may succeed, if condition=true & a message is available fail, if condition=false suspend, if condition=true & no message available yet Entire select statement may succeed, if any alternative succeeds -> pick one nondeterministically fail, if all alternatives fail suspend, if some alternatives suspend and none succeeds yet

24 Example: bounded buffer in Ada select when not FULL(BUFFER) => accept STORE_ITEM(X: INTEGER) do ‘store X in buffer’ end; or when not EMPTY(BUFFER) => accept FETCH_ITEM(X: out INTEGER) do X := ‘first item from buffer’ end; end select;

25 Synchronizing Resources (SR) Developed at University of Arizona Goals of SR: Expressiveness Many message passing primitives Ease of use Minimize number of underlying concepts Clean integration of language constructs Efficiency Each primitive must be efficient

26 Overview of SR Multiple forms of message passing Asynchronous message passing Rendezvous (explicit receipt) Remote Procedure Call (implicit receipt) Multicast Powerful receive-statement Conditional & ordered receive, based on contents of message Select statement Resource = module run on 1 node (uni/multiprocessor) Contains multiple threads that share variables

27 Orthogonality in SR The send and receive primitives can be combined in all 4 possible ways

28 Example body S #sender send R.m1 #asynchr. mp send R.m2 # fork call R.m1 # rendezvous call R.m2 # RPC end S body R #receiver proc M2( ) # implicit receipt # code to handle M2 end initial # main process of R do true -> #infinite loop in m1( ) # explicit receive # code to handle m1 ni od end end R

29 Traveling Salesman Problem (TSP) in SR Find shortest route for salesman among given set of cities Each city must be visited once, no return to initial city Saint Louis Miami Chicago New York 2 4 3 2 1 3 7

30 Sequential branch-and-bound Structure the entire search space as a tree, sorted using nearest-city first heuristic n csm cs m sm scc c m ms 2 2 3 33 3 4 344 4 1 11 1

31 Pruning the search tree Keep track of best solution found so far (the “bound”) Cut-off partial routes >= bound n csm cs m sm scc c m ms 2 2 3 33 3 4 344 4 1 11 1 Length=6 Can be pruned

32 Parallelizing TSP Distribute the search tree over the CPUs CPUs analyze different routes Results in reasonably large-grain jobs

33 Distribution of TSP search tree n csm cs m sm scc c m ms 2 2 3 33 3 4 344 4 1 11 1 CPU 1CPU 2CPU 3 Subtasks: - New York -> Chicago - New York -> Saint Louis - New York -> Miami

34 Distribution of the tree (2) Static distribution: each CPU gets a fixed part of the tree Load balancing problem: subtrees take different amounts of time n csm cs m sm scc c m m 2 2 3 33 3 4 344 4 s 1 11 1

35 Dynamic distribution: Replicated Workers Model Master process generates large number of jobs (subtrees) and repeatedly hands them out Worker processes (subcontractors) repeatedly take work and execute it 1 worker per processor General, frequently-used model for parallel processing

36 Implementing TSP in SR Need communication to distribute work Need communication to implement global bound

37 Distributing work Master generates jobs to be executed by workers Not known in advance which worker will execute which job A “mailbox” (port with >1 receivers) would have helped Use intermediate buffer process instead Masterbuffer workers

38 Implementing the global bound Problem: the bound is a global variable, but it must be implemented with message passing The bound is accessed millions of times, but updated only when a better route is found Only efficient solution is to manually replicate it

39 Managing a replicated variable in SR Use a BoundManager process to serialize updates BoundManagerWorker 1Worker 2 MMM M = copy of global Minimum M := 3 Assign(M,3) Update(M,3) Process 2 assigns to M Assign: asynchr. + explicit ordered recv. Update: synchr.+implicit recv.+multicast

40 SR code fragments for TSP body worker var M: int := Infinite # copy of bound sem sema # semaphore proc update(value: int) P(sema) # lock copy M := value V(sema) # unlock end update initial # main code for worker - can read M (using sema) - can use send BoundManager.Assign(value) body BoundManager var M: int := Infinite do true -> # handle requests 1 by 1 in Assign(value) by value -> if value M := value co(i := 1 to ncpus) # multicast call worker[i].update(value) co fi ni od end BoundManager

41 Search overhead n csm cs m sm scc c m ms 2 2 3 33 3 4 344 4 1 11 1 CPU 1CPU 2CPU 3 Problem Path with length=6 not yet computed by CPU 1 when CPU 3 starts n->m->s Parallel algorithm does more work than sequential algorithm: search overhead Not pruned :-(

42 Performance of TSP in SR Communication overhead Distribution of jobs + updating the global bound (small overhead) Load imbalances Replicated worker model has automatic load balancing Synchronization overhead Mutual exclusion (locking) needed for accessing copy of bound Search overhead Main performance problem In practice: high speedups possible


Download ppt "Interprocess Communication and Synchronization based on Message Passing."

Similar presentations


Ads by Google