Local-Spin Algorithms

Slides:

Advertisements

Similar presentations

Mutual Exclusion – SW & HW By Oded Regev. Outline: Short review on the Bakery algorithm Short review on the Bakery algorithm Black & White Algorithm Black.

Advertisements

Synchronization without Contention

Operating Systems Part III: Process Management (Process Synchronization)

1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.

John M. Mellor-Crummey Algorithms for Scalable Synchronization on Shared- Memory Multiprocessors Joseph Garvey & Joshua San Miguel Michael L. Scott.

Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.

Process Synchronization Continued 7.2 The Critical-Section Problem.

Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.

1 Chapter 2 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2007 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld.

Chapter 6: Process Synchronization

Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.

Multiprocessor Synchronization Algorithms ( ) Lecturer: Danny Hendler The Mutual Exclusion problem.

Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.

Local-spin, Abortable Mutual Exclusion Joe Rideout.

Scalable Reader-Writer Synchronization for Shared- Memory Multiprocessors Mellor-Crummey and Scott Presented by Robert T. Bauer.

Synchronization without Contention John M. Mellor-Crummey and Michael L. Scott+ ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun.

Local-Spin Algorithms Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

1 Course Syllabus 1. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation.

1 Chapter 3 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.

Local-Spin Algorithms Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

THIRD PART Algorithms for Concurrent Distributed Systems: The Mutual Exclusion problem.

CPSC 668Set 7: Mutual Exclusion with Read/Write Variables1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.

Multiprocess Synchronization Algorithms ( )

CPSC 668Set 6: Mutual Exclusion in Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.

1 Course Syllabus 1. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation.

CPSC 668Set 6: Mutual Exclusion in Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.

OS Spring’04 Concurrency Operating Systems Spring 2004.

1 Adaptive and Efficient Mutual Exclusion Presented by: By Hagit Attya and Vita Bortnikov Mian Huang.

Concurrency in Distributed Systems: Mutual exclusion.

Local-Spin Algorithms Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Synchronization Todd C. Mowry CS 740 November 1, 2000 Topics Locks Barriers Hardware primitives.

Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.

Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.

Maekawa’s algorithm Divide the set of processes into subsets that satisfy the following two conditions: i  S i  i,j :  i,j  n-1 :: S i  S j.

Process Synchronization Continued 7.2 Critical-Section Problem 7.3 Synchronization Hardware 7.4 Semaphores.

The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.

THIRD PART Algorithms for Concurrent Distributed Systems: The Mutual Exclusion problem.

Jeremy Denham April 7,  Motivation  Background / Previous work  Experimentation  Results  Questions.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 9 th Edition Chapter 5: Process Synchronization.

DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE

Mutual Exclusion Using Atomic Registers Lecturer: Netanel Dahan Instructor: Prof. Yehuda Afek B.Sc. Seminar on Distributed Computation Tel-Aviv University.

O(log n / log log n) RMRs Randomized Mutual Exclusion Danny Hendler Philipp Woelfel PODC 2009 Ben-Gurion University University of Calgary.

Operating Systems CMPSC 473 Mutual Exclusion Lecture 11: October 5, 2010 Instructor: Bhuvan Urgaonkar.

Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-5 Process Synchronization Department of Computer Science and Software.

Local-Spin Mutual Exclusion Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Concurrent Computing Seminar Introductory Lecture Instructor: Danny Hendler

CPSC 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 6: Mutual Exclusion in Shared Memory 1.

Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

1 Course Syllabus 1. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation.

OS Winter’03 Concurrency. OS Winter’03 Bakery algorithm of Lamport  Critical section algorithm for any n>1  Each time a process is requesting an entry.

Bakery Algorithm - Proof

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

O(log n / log log n) RMRs Randomized Mutual Exclusion

O(log n / log log n) RMRs Randomized Mutual Exclusion

Outline Monitors Barrier synchronization Readers and Writers

Concurrent Distributed Systems

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Course Syllabus 1. Introduction - History; Views; Concepts; Structure

Sitting on a Fence: Complexity Implications of Memory Reordering

Course Syllabus 1. Introduction - History; Views; Concepts; Structure

Multiprocessor Synchronization Algorithms ( )

Course Syllabus 1. Introduction - History; Views; Concepts; Structure

Chapter 6: Synchronization Tools

Course Syllabus 1. Introduction - History; Views; Concepts; Structure

Course Syllabus 1. Introduction - History; Views; Concepts; Structure

Process/Thread Synchronization (Part 2)

Syllabus 1. Introduction - History; Views; Concepts; Structure

Presentation transcript:

Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Local-Spin Algorithms Lecturer: Danny Hendler This presentation is based on the book “Synchronization Algorithms and Concurrent Programming” by G. Taubenfeld and on a the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

The Cache-Coherent (CC) and Distributed Shared Memory (DSM) models This figure is taken from the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

Remote and local memory accesses In a DSM system: local remote In a Cache-coherent system: An access of v by p is remote if it is the first access of v or if v has been written by another process since p’s last access of it.

Local-spin algorithms In a local-spin algorithm, all busy waiting (‘await’) is done by read-only loops of local-accesses, that do not cause interconnect traffic. The same algorithm may be local-spin on one architecture (DSM or CC) and non-local spin on the other. For local-spin algorithms, our complexity metric is the worst-case number of Remote Memory References (RMRs)

Peterson’s 2-process algorithm Program for process 0 b[0]:=true turn:=0 await (b[1]=false or turn=1) CS b[1]:=false Program for process 1 b[1]:=true turn:=1 await (b[0]=false or turn=0) CS b[1]:=false No Is this algorithm local-spin on a DSM machine? Yes Is this algorithm local-spin on a CC machine?

Peterson’s 2-process algorithm Program for process 0 b[0]:=true turn:=0 await (b[1]=false or turn=1) CS b[0]:=false Program for process 1 b[1]:=true turn:=1 await (b[0]=false or turn=0) CS b[1]:=false What is the RMR complexity on a DSM machine? Unbounded Constant What is the RMR complexity on a CC machine?

Recall the following simple test-and-set based algorithm While (! lock.test-and-set() ) // entry section Critical Section Lock := 0 // exit section Shared lock initially 0 Is this algorithm local-spin on either a DSM or CC machine? Nope.

A better algorithm: test-and-test-and-set While (! lock.test-and-set() )// entry section await(lock == 0) Critical Section Lock := 0 // exit section Shared lock initially 0 Creates less traffic in CC machines, still not local-spin.

Local Spinning Mutual Exclusion Using Strong Primitives

Anderson’s queue-based algorithm (Anderson, 1990) Shared: integer ticket – A RMW object, initially 0 bit valid[0..n-1], initially valid[0]=1 and valid[i]=0, for i{1,..,n-1} Local: integer myTicket ticket 1 1 2 3 n-1 valid 1 Program for process i myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket await valid[myTicket]=1 ; wait for your turn CS valid[myTicket]:=0 ; dequeue valid[myTicket+1 mod n]:=1 ; signal successor

Anderson’s queue-based algorithm (cont’d) 1 ticket valid After entry section of p3 myTicket3 Initial configuration ticket valid 1 After p1 performs entry section 2 ticket valid 1 myTicket3 myTicket1 2 ticket valid 1 After p3 exits myTicket1

Anderson’s queue-based algorithm (cont’d) Program for process i myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket await valid[myTicket]=1 ; wait for your turn CS valid[myTicket]:=0 ; dequeue valid[myTicket+1 mod n]:=1 ; signal successor What is the RMR complexity on a DSM machine? Unbounded Constant What is the RMR complexity on a CC machine?

Graunke and Thakkar’s algorithm (Graunke and Thakkar, 1990) Uses the more common swap (a.k.a. fetch-and-store) primitive: swap(w, new) do atomically prev:=*w *w:=new return prev

Graunke and Thakkar’s algorithm (cont’d) Shared: bit slots[0..n-1], initially slots[i]=1, for i{0,..,n-1} structure {bit value, bit* node} tail, initially {0, &slots[0]} Local: structure {bit value, bit* node} myRecord, prev bit temp tail 1 2 3 n-1 slots 1 1 1 1 1

Graunke and Thakkar’s algorithm (cont’d) Shared: bit slots[0..n-1], initially slots[i]=1, for i{0,..,n-1} structure {bit value, bit* node} tail, initially {0, &slots[0]} Local: structure {bit value, bit* node} myRecord, prev, bit temp Program for process i myRecord.value:=slots[i] ; prepare to thread yourself to queue myRecord.slot:=&slots[i] prev=swap(&tail, &myRecord) ; prev now points to predecessor await (*prev.slot ≠prev.value) ;local spin until predecessor’s value changes CS temp:=1-slots[i] slots[i]:=temp ; signal successor

Graunke and Thakkar’s algorithm (cont’d)

Graunke and Thakkar’s algorithm (cont’d) Program for process i myRecord.value:=slots[i] ; prepare to thread yourself to queue myRecord.slot:=&slots[i] prev=swap(&tail, myRecord) ; prev now points to predecessor await (*prev.slot ≠prev.value) ;local spin until predecessor’s value changes CS temp:=1-slots[i] slots[i]:=temp ; signal successor What is the RMR complexity on a DSM machine? Unbounded Constant What is the RMR complexity on a CC machine?

The MCS queue-based algorithm (Mellor-Crummey and Scott, 1991) Has constant RMR complexity under both the DSM and CC models Uses swap and CAS Type: Qnode: structure {bit locked, Qnode *next} Shared: Qnode nodes[0..n-1] Qnode *tail initially nil Local: Qnode *myNode, initially &nodes[i] Qnode *successor Tail nodes 1 2 3 n-1 n F T

The MCS queue-based algorithm (cont’d) Program for process i myNode->next := nil ; prepare to be last in queue pred=swap(&tail, myNode ) ;tail now points to myNode if (pred ≠ nil) ;I need to wait for a predecessor myNode->locked := true ;prepare to wait pred->next := myNode ;let my predecessor know it has to unlock me await myNode.locked := false CS if (myNode.next = nil) ; if not sure there is a successor if (compare-and-swap(&tail, myNode, nil) = false) ; if there is a successor await (myNode->next ≠ null) ; spin until successor lets me know its identity successor := myNode->next ; get a pointer to my successor successor->locked := false ; unlock my successor else ; for sure, I have a successor

The MCS queue-based algorithm (cont’d)

Local Spinning Mutual Exclusion Using reads and writes

A local-spin tournament-tree algorithm (Anderson, Yang, 1993) Each node is identified by (level, number) 1 2 3 4 5 6 7 Level 0 Level 1 Level 2 Processes O(log n) RMR complexity for both DSM and CC systems This is optimal (Attiya, Hendler, woelfel, 2008) Uses O(n log n) registers

A local-spin tournament-tree algorithm (cont’d) Shared: - Per each node, v, there are 3 registers: name[level, 2node], name[level, 2node+1] initially -1 turn[level, node] - Per each level l and process i, a spin flag: flag[ level, i ] initially 0 Local: level, node, id

A local-spin tournament-tree algorithm (cont’d) Program for process i node:=i For level = o to log n-1 do ;from leaf to root node:= node/2 ;compute node in new level id=node mod 2 ; compute ID for 2-process mutex algorithm (0 or 1) name[level, 2node + id]:=i ;identify yourself turn[level,node]:=i ;update the tie-breaker flag[level, i]:=0 ;initialize my locally-accessible spin flag rival:=name[level, 2node+1-id] if ( (rival ≠ -1) and (turn[level, node] = i) ) ;if not sure I should precede rival if (flag[level, rival] =0) If rival may get to wait at line 14 flag[level, rival]:=1 ;Release rival by letting it know I updated tie-breaker await flag[level, i] ≠ 0 ;await until signaled by rival (so it updated tie-breaker) if (turn[level,node]=i) ;if I lost await flag[level,i]=2 ;wait till rival notifies me its my turn id:=node ;move to the next level EndFor CS for level=log n –1 downto 0 do ;begin exit code id:=  i/2level , node:= id/2 ;set node and id name[level, 2node+id ]) :=-1 ;erase name rival := turn[level,node] ;find who rival is (if there is one) if rival ≠ i ;if there is a rival flag[level,rival] :=2 ;notify rival Flag assumes values {0,1,2} and initialized to 0. Value 1 is written in order to release the rival if it is spinning in line 14 (waiting to know if it is the first or last to write to turn) Value 2 indicates rival exited CS

Local-Spin Leader Election Exactly one process is elected All other processes are not-elected Processes may busy-wait

Choy and Sing's filter m processes Filter The rest are “halted” Between 1 and m/2 processes “exit “ Filter guarantees: Safety: if m processes enter a filter, at most m/2 exit. Progress: if some processes enter a filter, at least one exits.

Choy and Singh's filter (cont’d) Shared: integer turn Boolean b, initially false Program for process i turn := i await b // wait for barrier to open b := true // close barrier if turn ≠ i // not last to cross the barrier b := false // open barrier halt else exit Why does the barrier has to be re-opened? Why are filter guarantees satisfied?

Choy and Sing’s filter algorithm Filter #i

Choy and Sing’s filter algorithm (cont’d) Shared: typdef struct{integer turn, boolean b,c initially false} filter filter A[log n + 1] Program for process i For (curr=0; cur < log n +1; curr++) A[curr].turn := p Await  A[curr].b A[curr].b:=true if (A[curr]. turn ≠ i) A[curr].c := true // mark that some process failed on filter A[curr].b := false return not-elected else if (curr > 0)  A[curr-1].c return elected // Other processes will never reach this filter Else curr := curr+1 EndFor Do you see any problem with this algorithm? How can this be fixed?

Choy and Sing’s filter algorithm (cont’d) What is the DSM RMR complexity? What is the CC RMR complexity? What is the worst-case average (CC) RMR complexity?