Download presentation

1
**Local-Spin Algorithms**

Multiprocessor synchronization algorithms ( ) Local-Spin Algorithms Lecturer: Danny Hendler This presentation is based on the book “Synchronization Algorithms and Concurrent Programming” by G. Taubenfeld and on a the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

2
**The Cache-Coherent (CC) and Distributed Shared Memory (DSM) models**

This figure is taken from the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

3
**Remote and local memory accesses**

In a DSM system: local remote In a Cache-coherent system: An access of v by p is remote if it is the first access of v or if v has been written by another process since p’s last access of it.

4
**Local-spin algorithms**

In a local-spin algorithm, all busy waiting (‘await’) is done by read-only loops of local-accesses, that do not cause interconnect traffic. The same algorithm may be local-spin on one architecture (DSM or CC) and non-local spin on the other. For local-spin algorithms, our complexity metric is the worst-case number of Remote Memory References (RMRs)

5
**Peterson’s 2-process algorithm**

Program for process 0 b[0]:=true turn:=0 await (b[1]=false or turn=1) CS b[1]:=false Program for process 1 b[1]:=true turn:=1 await (b[0]=false or turn=0) CS b[1]:=false No Is this algorithm local-spin on a DSM machine? Yes Is this algorithm local-spin on a CC machine?

6
**Peterson’s 2-process algorithm**

Program for process 0 b[0]:=true turn:=0 await (b[1]=false or turn=1) CS b[0]:=false Program for process 1 b[1]:=true turn:=1 await (b[0]=false or turn=0) CS b[1]:=false What is the RMR complexity on a DSM machine? Unbounded Constant What is the RMR complexity on a CC machine?

7
**Recall the following simple test-and-set based algorithm**

While (! lock.test-and-set() ) // entry section Critical Section Lock := 0 // exit section Shared lock initially 0 Is this algorithm local-spin on either a DSM or CC machine? Nope.

8
**A better algorithm: test-and-test-and-set**

While (! lock.test-and-set() )// entry section await(lock == 0) Critical Section Lock := 0 // exit section Shared lock initially 0 Creates less traffic in CC machines, still not local-spin.

9
**Local Spinning Mutual Exclusion Using Strong Primitives**

10
**Anderson’s queue-based algorithm (Anderson, 1990)**

Shared: integer ticket – A RMW object, initially 0 bit valid[0..n-1], initially valid[0]=1 and valid[i]=0, for i{1,..,n-1} Local: integer myTicket ticket 1 1 2 3 n-1 valid 1 Program for process i myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket await valid[myTicket]=1 ; wait for your turn CS valid[myTicket]:=0 ; dequeue valid[myTicket+1 mod n]:=1 ; signal successor

11
**Anderson’s queue-based algorithm (cont’d)**

1 ticket valid After entry section of p3 myTicket3 Initial configuration ticket valid 1 After p1 performs entry section 2 ticket valid 1 myTicket3 myTicket1 2 ticket valid 1 After p3 exits myTicket1

12
**Anderson’s queue-based algorithm (cont’d)**

Program for process i myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket await valid[myTicket]=1 ; wait for your turn CS valid[myTicket]:=0 ; dequeue valid[myTicket+1 mod n]:=1 ; signal successor What is the RMR complexity on a DSM machine? Unbounded Constant What is the RMR complexity on a CC machine?

13
**Graunke and Thakkar’s algorithm (Graunke and Thakkar, 1990)**

Uses the more common swap (a.k.a. fetch-and-store) primitive: swap(w, new) do atomically prev:=*w *w:=new return prev

14
**Graunke and Thakkar’s algorithm (cont’d)**

Shared: bit slots[0..n-1], initially slots[i]=1, for i{0,..,n-1} structure {bit value, bit* node} tail, initially {0, &slots[0]} Local: structure {bit value, bit* node} myRecord, prev bit temp tail 1 2 3 n-1 slots 1 1 1 1 1

15
**Graunke and Thakkar’s algorithm (cont’d)**

Shared: bit slots[0..n-1], initially slots[i]=1, for i{0,..,n-1} structure {bit value, bit* node} tail, initially {0, &slots[0]} Local: structure {bit value, bit* node} myRecord, prev, bit temp Program for process i myRecord.value:=slots[i] ; prepare to thread yourself to queue myRecord.slot:=&slots[i] prev=swap(&tail, &myRecord) ; prev now points to predecessor await (*prev.slot ≠prev.value) ;local spin until predecessor’s value changes CS temp:=1-slots[i] slots[i]:=temp ; signal successor

16
**Graunke and Thakkar’s algorithm (cont’d)**

17
**Graunke and Thakkar’s algorithm (cont’d)**

Program for process i myRecord.value:=slots[i] ; prepare to thread yourself to queue myRecord.slot:=&slots[i] prev=swap(&tail, myRecord) ; prev now points to predecessor await (*prev.slot ≠prev.value) ;local spin until predecessor’s value changes CS temp:=1-slots[i] slots[i]:=temp ; signal successor What is the RMR complexity on a DSM machine? Unbounded Constant What is the RMR complexity on a CC machine?

18
**The MCS queue-based algorithm (Mellor-Crummey and Scott, 1991)**

Has constant RMR complexity under both the DSM and CC models Uses swap and CAS Type: Qnode: structure {bit locked, Qnode *next} Shared: Qnode nodes[0..n-1] Qnode *tail initially nil Local: Qnode *myNode, initially &nodes[i] Qnode *successor Tail nodes 1 2 3 n-1 n F T

19
**The MCS queue-based algorithm (cont’d)**

Program for process i myNode->next := nil ; prepare to be last in queue pred=swap(&tail, myNode ) ;tail now points to myNode if (pred ≠ nil) ;I need to wait for a predecessor myNode->locked := true ;prepare to wait pred->next := myNode ;let my predecessor know it has to unlock me await myNode.locked := false CS if (myNode.next = nil) ; if not sure there is a successor if (compare-and-swap(&tail, myNode, nil) = false) ; if there is a successor await (myNode->next ≠ null) ; spin until successor lets me know its identity successor := myNode->next ; get a pointer to my successor successor->locked := false ; unlock my successor else ; for sure, I have a successor

20
**The MCS queue-based algorithm (cont’d)**

21
**Local Spinning Mutual Exclusion Using reads and writes**

22
**A local-spin tournament-tree algorithm (Anderson, Yang, 1993)**

Each node is identified by (level, number) 1 2 3 4 5 6 7 Level 0 Level 1 Level 2 Processes O(log n) RMR complexity for both DSM and CC systems This is optimal (Attiya, Hendler, woelfel, 2008) Uses O(n log n) registers

23
**A local-spin tournament-tree algorithm (cont’d)**

Shared: - Per each node, v, there are 3 registers: name[level, 2node], name[level, 2node+1] initially turn[level, node] - Per each level l and process i, a spin flag: flag[ level, i ] initially 0 Local: level, node, id

24
**A local-spin tournament-tree algorithm (cont’d)**

Program for process i node:=i For level = o to log n-1 do ;from leaf to root node:= node/2 ;compute node in new level id=node mod 2 ; compute ID for 2-process mutex algorithm (0 or 1) name[level, 2node + id]:=i ;identify yourself turn[level,node]:=i ;update the tie-breaker flag[level, i]:=0 ;initialize my locally-accessible spin flag rival:=name[level, 2node+1-id] if ( (rival ≠ -1) and (turn[level, node] = i) ) ;if not sure I should precede rival if (flag[level, rival] =0) If rival may get to wait at line 14 flag[level, rival]:=1 ;Release rival by letting it know I updated tie-breaker await flag[level, i] ≠ 0 ;await until signaled by rival (so it updated tie-breaker) if (turn[level,node]=i) ;if I lost await flag[level,i]=2 ;wait till rival notifies me its my turn id:=node ;move to the next level EndFor CS for level=log n –1 downto 0 do ;begin exit code id:= i/2level , node:= id/2 ;set node and id name[level, 2node+id ]) :=-1 ;erase name rival := turn[level,node] ;find who rival is (if there is one) if rival ≠ i ;if there is a rival flag[level,rival] :=2 ;notify rival Flag assumes values {0,1,2} and initialized to 0. Value 1 is written in order to release the rival if it is spinning in line 14 (waiting to know if it is the first or last to write to turn) Value 2 indicates rival exited CS

25
**Local-Spin Leader Election**

Exactly one process is elected All other processes are not-elected Processes may busy-wait

26
**Choy and Sing's filter m processes Filter The rest are “halted”**

Between 1 and m/2 processes “exit “ Filter guarantees: Safety: if m processes enter a filter, at most m/2 exit. Progress: if some processes enter a filter, at least one exits.

27
**Choy and Singh's filter (cont’d)**

Shared: integer turn Boolean b, initially false Program for process i turn := i await b // wait for barrier to open b := true // close barrier if turn ≠ i // not last to cross the barrier b := false // open barrier halt else exit Why does the barrier has to be re-opened? Why are filter guarantees satisfied?

28
**Choy and Sing’s filter algorithm**

Filter #i

29
**Choy and Sing’s filter algorithm (cont’d)**

Shared: typdef struct{integer turn, boolean b,c initially false} filter filter A[log n + 1] Program for process i For (curr=0; cur < log n +1; curr++) A[curr].turn := p Await A[curr].b A[curr].b:=true if (A[curr]. turn ≠ i) A[curr].c := true // mark that some process failed on filter A[curr].b := false return not-elected else if (curr > 0) A[curr-1].c return elected // Other processes will never reach this filter Else curr := curr+1 EndFor Do you see any problem with this algorithm? How can this be fixed?

30
**Choy and Sing’s filter algorithm (cont’d)**

What is the DSM RMR complexity? What is the CC RMR complexity? What is the worst-case average (CC) RMR complexity?

Similar presentations

OK

Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Maths addition ppt on blocks Ppt on astronomy and astrophysics news Ppt on power diode rectifier Ppt on introduction to object-oriented programming examples Ppt on road accidents uk Ppt on data integrity proofs in cloud storage Convert pdf ppt to ppt online training Ppt on p&g products rebates list Ppt on obesity and overweight Ppt on contractual capacity of parties