Download presentation

Presentation is loading. Please wait.

Published byChristopher Nicholson Modified about 1 year ago

1
Local-Spin Algorithms Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization Algorithms and Concurrent Programming” by G. Taubenfeld and on a the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

2
This figure is taken from the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman The Cache-Coherent (CC) and Distributed Shared Memory (DSM) models

3
Remote and local memory accesses In a DSM system: local remote In a Cache-coherent system: An access of v by p is remote if it is the first access of v or if v has been written by another process since p’s last access of it.

4
Local-spin algorithms In a local-spin algorithm, all busy waiting (‘await’) is done by read-only loops of local-accesses, that do not cause interconnect traffic. The same algorithm may be local-spin on one architecture (DSM or CC) and non-local spin on the other. For local-spin algorithms, our complexity metric is the worst-case number of Remote Memory References (RMRs)

5
Peterson’s 2-process algorithm Program for process 1 1.b[1]:=true 2.turn:=1 3.await (b[0]=false or turn=0) 4.CS 5.b[1]:=false Program for process 0 1.b[0]:=true 2.turn:=0 3.await (b[1]=false or turn=1) 4.CS 5.b[1]:=false Is this algorithm local-spin on a DSM machine? No Is this algorithm local-spin on a CC machine? Yes

6
Peterson’s 2-process algorithm Program for process 1 1.b[1]:=true 2.turn:=1 3.await (b[0]=false or turn=0) 4.CS 5.b[1]:=false Program for process 0 1.b[0]:=true 2.turn:=0 3.await (b[1]=false or turn=1) 4.CS 5.b[0]:=false What is the RMR complexity on a DSM machine? Unbounded What is the RMR complexity on a CC machine? Constant

7
Recall the following simple test-and-set based algorithm Shared lock initially 0 1.While (! lock.test-and-set() ) // entry section 2.Critical Section 3.Lock := 0 // exit section Is this algorithm local-spin on either a DSM or CC machine? Nope.

8
A better algorithm: test-and-test-and-set Shared lock initially 0 1.While (! lock.test-and-set() )// entry section 2. await(lock == 0) 3.Critical Section 4.Lock := 0 // exit section Creates less traffic in CC machines, still not local-spin.

9
Local Spinning Mutual Exclusion Using Strong Primitives

10
Anderson’s queue-based algorithm (Anderson, 1990) Shared: integer ticket – A RMW object, initially 0 bit valid[0..n-1], initially valid[0]=1 and valid[i]=0, for i {1,..,n-1} Local: integer myTicket Program for process i 1.myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket 2.await valid[myTicket]=1 ; wait for your turn 3.CS 4.valid[myTicket]:=0 ; dequeue 5.valid[myTicket+1 mod n]:=1 ; signal successor 0123n-1 valid ticket

11
Anderson’s queue-based algorithm (cont’d) 0 ticket valid Initial configuration 1 ticket valid After entry section of p 3 0 myTicket 3 After p 1 performs entry section 2 ticket valid myTicket 3 1 myTicket 1 2 ticket valid After p 3 exits 1 myTicket 1

12
Anderson’s queue-based algorithm (cont’d) What is the RMR complexity on a DSM machine? Unbounded What is the RMR complexity on a CC machine? Constant Program for process i 1.myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket 2.await valid[myTicket]=1 ; wait for your turn 3.CS 4.valid[myTicket]:=0 ; dequeue 5.valid[myTicket+1 mod n]:=1 ; signal successor

13
Graunke and Thakkar’s algorithm (Graunke and Thakkar, 1990) Uses the more common swap (a.k.a. fetch-and-store) primitive: swap(w, new) do atomically prev:=*w *w:=new return prev

14
Graunke and Thakkar’s algorithm (cont’d) Shared: bit slots[0..n-1], initially slots[i]=1, for i {0,..,n-1} structure {bit value, bit* node} tail, initially {0, &slots[0]} Local: structure {bit value, bit* node} myRecord, prev bit temp 0 tail n-11 slots

15
Graunke and Thakkar’s algorithm (cont’d) Shared: bit slots[0..n-1], initially slots[i]=1, for i {0,..,n-1} structure {bit value, bit* node} tail, initially {0, &slots[0]} Local: structure {bit value, bit* node} myRecord, prev, bit temp Program for process i 1.myRecord.value:=slots[i] ; prepare to thread yourself to queue 2.myRecord.slot:=&slots[i] 3.prev=swap(&tail, &myRecord) ; prev now points to predecessor 4.await (*prev.slot ≠ prev.value) ;local spin until predecessor’s value changes 5.CS 6.temp:=1-slots[i] 7.slots[i]:=temp ; signal successor

16
Graunke and Thakkar’s algorithm (cont’d)

17
What is the RMR complexity on a DSM machine? Unbounded What is the RMR complexity on a CC machine? Constant Program for process i 1.myRecord.value:=slots[i] ; prepare to thread yourself to queue 2.myRecord.slot:=&slots[i] 3.prev=swap(&tail, myRecord) ; prev now points to predecessor 4.await (*prev.slot ≠ prev.value) ;local spin until predecessor’s value changes 5.CS 6.temp:=1-slots[i] 7.slots[i]:=temp ; signal successor

18
The MCS queue-based algorithm (Mellor-Crummey and Scott, 1991) Type: Qnode: structure {bit locked, Qnode *next} Shared: Qnode nodes[0..n-1] Qnode *tail initially nil Local: Qnode *myNode, initially &nodes[i] Qnode *successor Has constant RMR complexity under both the DSM and CC models Uses swap and CAS Tail nodes n-1 n FTT

19
The MCS queue-based algorithm (cont’d) Program for process i 1.myNode->next := nil ; prepare to be last in queue 2.pred=swap(&tail, myNode ) ;tail now points to myNode 3.if (pred ≠ nil) ;I need to wait for a predecessor 4. myNode->locked := true ;prepare to wait 5. pred->next := myNode ;let my predecessor know it has to unlock me 6. await myNode.locked := false 7.CS 8.if (myNode.next = nil) ; if not sure there is a successor 9. if (compare-and-swap(&tail, myNode, nil) = false) ; if there is a successor 10. await (myNode->next ≠ null) ; spin until successor lets me know its identity 11. successor := myNode->next ; get a pointer to my successor 12. successor->locked := false ; unlock my successor 13.else ; for sure, I have a successor 14. successor := myNode->next ; get a pointer to my successor 15. successor->locked := false ; unlock my successor

20
The MCS queue-based algorithm (cont’d)

21
Local Spinning Mutual Exclusion Using reads and writes

22
A local-spin tournament-tree algorithm (Anderson, Yang, 1993) O(log n) RMR complexity for both DSM and CC systems This is optimal (Attiya, Hendler, woelfel, 2008) Uses O(n log n) registers Level 0 Level 1 Level 2 Processes Each node is identified by (level, number)

23
A local-spin tournament-tree algorithm (cont’d) Shared: - Per each node, v, there are 3 registers: name[level, 2node], name[level, 2node+1] initially -1 turn[level, node] - Per each level l and process i, a spin flag: flag[ level, i ] initially 0 Local : level, node, id

24
A local-spin tournament-tree algorithm (cont’d) Program for process i 1.node:=i 2.For level = o to log n-1 do ;from leaf to root 3. node:= node/2 ;compute node in new level 4. id=node mod 2 ; compute ID for 2-process mutex algorithm (0 or 1) 5. name[level, 2node + id]:=i ;identify yourself 6. turn[level,node]:=i ;update the tie-breaker 7. flag[level, i]:=0 ;initialize my locally-accessible spin flag 8. rival:=name[level, 2node+1-id] 9. if ( (rival ≠ -1) and (turn[level, node] = i) ) ;if not sure I should precede rival 10. if (flag[level, rival] =0) If rival may get to wait at line flag[level, rival]:=1 ;Release rival by letting it know I updated tie-breaker 12. await flag[level, i] ≠ 0 ;await until signaled by rival (so it updated tie-breaker) 13. if (turn[level,node]=i) ;if I lost 14. await flag[level,i]=2 ;wait till rival notifies me its my turn 15. id:=node ;move to the next level 16.EndFor 17.CS 18.for level=log n –1 downto 0 do ;begin exit code 19. id:= i/2 level , node:= id/2 ;set node and id 20. name[level, 2node+id ]) :=-1 ;erase name 21. rival := turn[level,node] ;find who rival is (if there is one) 22. if rival ≠ i ;if there is a rival 23. flag[level,rival] :=2 ;notify rival

25
Local-Spin Leader Election Exactly one process is elected All other processes are not-elected Processes may busy-wait

26
Choy and Sing's filter Filter m processes The rest are “halted” Between 1 and m/2 processes “exit “ Filter guarantees: Safety: if m processes enter a filter, at most m/2 exit. Progress: if some processes enter a filter, at least one exits.

27
Choy and Singh's filter (cont’d) Shared: integer turn Boolean b, initially false Program for process i 1.turn := i 2.await b // wait for barrier to open 3.b := true // close barrier 4.if turn ≠ i // not last to cross the barrier 5. b := false // open barrier 6. halt 7.else 8. exit Why are filter guarantees satisfied? Why does the barrier has to be re-opened?

28
Choy and Sing’s filter algorithm Filter #1 Filter #2 Filter #i

29
Choy and Sing’s filter algorithm (cont’d) Shared: typdef struct{integer turn, boolean b,c initially false} filter filter A[log n + 1] Program for process i 1.For (curr=0; cur < log n +1; curr++) 2. A[curr].turn := p 3. Await A[curr].b 4. A[curr].b:=true 5. if (A[curr]. turn ≠ i) 6. A[curr].c := true // mark that some process failed on filter 7. A[curr].b := false 8. return not-elected 9. else if (curr > 0) A[curr-1].c 10. return elected // Other processes will never reach this filter 11. Else 12. curr := curr+1 13.EndFor Do you see any problem with this algorithm? How can this be fixed?

30
Choy and Sing’s filter algorithm (cont’d) What is the DSM RMR complexity? What is the CC RMR complexity? What is the worst-case average (CC) RMR complexity?

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google