Simple, Fast, and Practical Non- Blocking and Blocking Concurrent Queue Algorithms Presenter: Jim Santmyer By: Maged M. Micheal Michael L. Scott Department.

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

Stacks, Queues, and Linked Lists
Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas.
Wait-Free Queues with Multiple Enqueuers and Dequeuers
Ceng-112 Data Structures I Chapter 5 Queues.
1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
CS252: Systems Programming Ninghui Li Program Interview Questions.
CS510 – Advanced Operating Systems 1 The Synergy Between Non-blocking Synchronization and Operating System Structure By Michael Greenwald and David Cheriton.
Maged M. Michael, “Hazard Pointers: Safe Memory Reclamation for Lock- Free Objects” Presentation Robert T. Bauer.
Scalable and Lock-Free Concurrent Dictionaries
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Scalable Synchronous Queues By William N. Scherer III, Doug Lea, and Michael L. Scott Presented by Ran Isenberg.
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
Chapter 12 C Data Structures Acknowledgment The notes are adapted from those provided by Deitel & Associates, Inc. and Pearson Education Inc.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. Chapter 12 – Data Structures Outline 12.1Introduction.
CS533 - Concepts of Operating Systems 1 Class Discussion.
Simple, Fast and Practical Non- Blocking and Blocking Concurrent Queue Algorithms Maged M. Michael & Michael L. Scott Presented by Ahmed Badran.
SUPPORTING LOCK-FREE COMPOSITION OF CONCURRENT DATA OBJECTS Daniel Cederman and Philippas Tsigas.
שירן חליבה Concurrent Queues. Outline: Some definitions 3 queue implementations : A Bounded Partial Queue An Unbounded Total Queue An Unbounded Lock-Free.
1 Lock-Free Linked Lists Using Compare-and-Swap by John Valois Speaker’s Name: Talk Title: Larry Bush.
Chapter 12 Data Structure Associate Prof. Yuh-Shyan Chen Dept. of Computer Science and Information Engineering National Chung-Cheng University.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
 2007 Pearson Education, Inc. All rights reserved C Data Structures.
CS510 Concurrent Systems Jonathan Walpole. A Lock-Free Multiprocessor OS Kernel.
A Two-Lock Concurrent Queue Algorithm Maged M. Michael, Michael L. Scott University of Rochester Presented by Hussain Tinwala.
Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.
November 15, 2007 A Java Implementation of a Lock- Free Concurrent Priority Queue Bart Verzijlenberg.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
JAVA MEMORY MODEL AND ITS IMPLICATIONS Srikanth Seshadri
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 122 – Data Structures Data Structures Queues.
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects Maged M. Michael Presented by Abdulai Sei.
CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.
Gal Milman Based on Chapter 10 (Concurrent Queues and the ABA Problem) in The Art of Multiprocessor Programming by Herlihy and Shavit Seminar 2 (236802)
A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy Slides by Vincent Rayappa.
Stacks Queues Introduction to Trees. Stacks An Everyday Example Your boss keeps bringing you important items to deal with and keeps saying: “Put that.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,
CS510 Concurrent Systems Tyler Fetters. A Methodology for Implementing Highly Concurrent Data Objects.
Slides on threads borrowed by Chase Landon Cox. Thread states BlockedReady Running Thread is scheduled Thread is Pre-empted (or yields) Thread calls Lock.
Hwajung Lee. Mutual Exclusion CS p0 p1 p2 p3 Some applications are:  Resource sharing  Avoiding concurrent update on shared data  Controlling the.
Data Structures Intro2CS – week Stack ADT (Abstract Data Type) A container with 3 basic actions: – push(item) – pop() – is_empty() Semantics: –
Review Array Array Elements Accessing array elements
Checking nonblocking concurrent queue program
Queues Chapter 4.
Chapter 12 – Data Structures
12 C Data Structures.
Håkan Sundell Philippas Tsigas
Lock-Free Linked Lists Using Compare-and-Swap
Department of Computer Science, University of Rochester
Stacks and Queues.
Queues Queues Queues.
Queues Chapter 4.
A Lock-Free Algorithm for Concurrent Bags
CS510 Concurrent Systems Jonathan Walpole.
Anders Gidenstam Håkan Sundell Philippas Tsigas
CS510 Concurrent Systems Jonathan Walpole.
CS510 - Portland State University
Yiannis Nikolakopoulos
CS210- Lecture 5 Jun 9, 2005 Agenda Queues
CS510 Advanced Topics in Concurrency
Instructor: Dr. Michael Geiger Spring 2019 Lecture 29: Linked queues
CS510 Concurrent Systems Jonathan Walpole.
Presentation transcript:

Simple, Fast, and Practical Non- Blocking and Blocking Concurrent Queue Algorithms Presenter: Jim Santmyer By: Maged M. Micheal Michael L. Scott Department of Computer Science, University of Rochester

Simple, Fast, and Practical Non- Blocking and Blocking Concurrent Queue Algorithms Contributions from Past Presentations by: Ahmed Badran (Cs ) Joseph Rosenski (Cs )

Agenda Presentation of the Issues Non-Blocking Queue Algorithm Two Lock Concurrent Queue Algorithm Performance Conclusion

Issue – Concurrent FIFO queues Must synchronize access to insure correctness Two types of synchronize (Generally) - Blocking Allows slower process to delay faster process - Non-blocking Guarantee if there are one or more active processes trying to perform operations on a shared data structure, SOME operation will complete within a finite number of time steps.

Issue – Blocking Blocking algorithms in general - Uses locks - May deadlock - Processes may wait for arbitrarily long times - Lock/unlock primitives need to interact with scheduling logic to avoid priority inversion - Possibility of starvation

Issue – Blocking If Blocking used for queues: - Two Locks better than one - Allows concurrency between enqueue and dequeue processes - Use Dummy node to prevent contention

Issue – Blocking, Why two locks? Queue with one lock: P1(enqueue) – acquires lock & finishes P2(enqueue) – acquires lock & proceeds P3(dequeue) – Blocked, P2 has lock A more efficient algorithm would allow P3 to proceed as soon as P1 has completed, i.e. when there is a node available to be dequeued.

Issue – Blocking Using Two Locks Two locks allows, one for tail pointer updates and a second lock for updates to the head pointer. - Enqueue process locks tail pointer - Dequeue process locks head pointer A dummy node prevents contention because enqueue process does not access head pointer/lock and dequeue process does not access tail pointer/lock.

Issue – Blocking Using Two Locks without dummy node value next Head Tail H_lock T_lock Dequeue requires updata of both head and tail

Issue – Blocking Using Two Locks without dummy node Head Tail H_lock T_lock Dequeue requires updata of both head and tail OO

Issue – Blocking Using Two Locks without dummy node Head Tail H_lock T_lock Enqueue requires updata of both tail and head O O

Issue – Blocking Using Two Locks without dummy node valu e next Head Tail H_lock T_lock Issue – Blocking Using Two Locks without dummy node value next Head Tail H_lock T_lock Enqueue requires updata of both tail and head

Issue – Tail lag behind Head value next value next value next Head Tail H_lock T_lock This illustrates a Blocking/locking algorithm, but the same issue applies to Non-blocking Algorithms. The issue is caused by separate head and tail pointers

Issue – Tail lag behind Head value next value next Head Tail H_lock T_lock

Issue – Tail lag behind Head value next value next Head Tail H_lock T_lock

Issue – Tail lag behind Head value next value next Head Tail H_lock T_lock free(node) One solution is for a faster dequeue process to complete the work of the slower enqueue process by swinging the tail pointer to the next node before dequeuing the node. - What about lock contention, or deadlock? - Can use reference counter, and not free node until counter is zero

Issue – Tail lag behind Head Problem with reference count Head Tail H_lock T_lock If cnt == 0 free(node) valuenextcnt=1 ProcessY slow or never releases reference can cause system to run out of memory Will discuss solutions during algorithm walk through valuenextcnt=1... valuenextcnt=3

Non-Blocking Algorithms Optimistic Have a structure similar to: Initialize local structures Begin loop Do some work If CAS == true break End loop

Issue - Linearizability A data structure gives an external observer the illusion that the operations takes effect instantaneously This requires that there be a specific point during each operation at which it is considered to “take effect”.

Issue - Linearizability Similar to Serializability in Databases - History instead of Transactions - Invocations/responses vs Reads/Writes

ABA problem SM 5 Stack THREAD1THREAD2 v1=Pop() SP = 5 newSP = 4 data=X time v2=Pop() SP = 5 newSP = 4 data=X x Pop Pop () { loop SP = SM newSP = SP -1 data = Stack Data CAS(&SM, SP, newSP) break return data Push Push (d) { loop SP = SM newSP = SP +1 Stack Data = d CAS(&SM, SP, newSP) break …

ABA problem THREAD1THREAD2 time v1=Pop() SP = 5 newSP = 4 data=X Pop Pop () { loop SP = SM newSP = SP -1 data = Stack Data CAS(&SM, SP, newSP) break return data Push Push (d) { loop SP = SM newSP = SP +1 Stack Data = d CAS(&SM, SP, newSP) break v2=Pop() SP = 5 newSP = 4 data=X SM 5 Stack x …

ABA problem SM 4 Stack THREAD1THREAD2 time v2=Pop() SP = 5 newSP = 4 data=X CAS(&SM,SP,newSP) v2 = x x … v1=Pop() SP = 5 newSP = 4 data=X Pop Pop () { loop SP = SM newSP = SP -1 data = Stack Data CAS(&SM, SP, newSP) break return data Push Push (d) { loop SP = SM newSP = SP +1 Stack Data = d CAS(&SM, SP, newSP) break

ABA problem SM 4 Stack THREAD1THREAD2 time Push(z) x … v1=Pop() SP = 5 newSP = 4 data=X Pop Pop () { loop SP = SM newSP = SP -1 data = Stack Data CAS(&SM, SP, newSP) break return data Push Push (d) { loop SP = SM newSP = SP +1 Stack Data = d CAS(&SM, SP, newSP) break v2=Pop() SP = 5 newSP = 4 data=X CAS(&SM,SP,newSP) v2 = x

ABA problem SM 4 Stack THREAD1THREAD2 time Push(z) SP = 4 x … v1=Pop() SP = 5 newSP = 4 data=X Pop Pop () { loop SP = SM newSP = SP -1 data = Stack Data CAS(&SM, SP, newSP) break return data Push Push (d) { loop SP = SM newSP = SP +1 Stack Data = d CAS(&SM, SP, newSP) break v2=Pop() SP = 5 newSP = 4 data=X CAS(&SM,SP,newSP) v2 = x

ABA problem SM 4 Stack THREAD1THREAD2 time Push(z) SP = 4 newSP=5 x … v1=Pop() SP = 5 newSP = 4 data=X Pop Pop () { loop SP = SM newSP = SP -1 data = Stack Data CAS(&SM, SP, newSP) break return data Push Push (d) { loop SP = SM newSP = SP +1 Stack Data = d CAS(&SM, SP, newSP) break v2=Pop() SP = 5 newSP = 4 data=X CAS(&SM,SP,newSP) v2 = x

ABA problem SM 5 Stack THREAD1THREAD2 time Push(z) SP = 4 newSP=5 CAS(&SM,SP,newSP) z … v1=Pop() SP = 5 newSP = 4 data=X Pop Pop () { loop SP = SM newSP = SP -1 data = Stack Data CAS(&SM, SP, newSP) break return data Push Push (d) { loop SP = SM newSP = SP +1 Stack Data = d CAS(&SM, SP, newSP) break v2=Pop() SP = 5 newSP = 4 data=X CAS(&SM,SP,newSP) v2 = x

ABA problem SM 5 Stack THREAD1THREAD2 time Push(z) SP = 4 newSP=5 CAS(&SM,SP,newSP) z … v1=Pop() SP = 5 newSP = 4 data=X Pop Pop () { loop SP = SM newSP = SP -1 data = Stack Data CAS(&SM, SP, newSP) break return data Push Push (d) { loop SP = SM newSP = SP +1 Stack Data = d CAS(&SM, SP, newSP) break v2=Pop() SP = 5 newSP = 4 data=X CAS(&SM,SP,newSP) v2 = x

ABA problem SM 4 Stack THREAD1THREAD2 time Push(z) SP = 4 newSP=5 CAS(&SM,SP,newSP) z v1=x … v1=Pop() SP = 5 newSP = 4 data=X v1=Pop() SP = 5 newSP = 4 data=X Pop Pop () { loop SP = SM newSP = SP -1 data = Stack Data CAS(&SM, SP, newSP) break return data Push Push (d) { loop SP = SM newSP = SP +1 Stack Data = d CAS(&SM, SP, newSP) break v2=Pop() SP = 5 newSP = 4 data=X CAS(&SM,SP,newSP) v2 = x

ABA problem SM 4 Stack THREAD1THREAD2 time Push(z) SP = 4 newSP=5 CAS(&SM,SP,newSP) z v1=x … CAS should fail but it succeeds Both threads retrieved data=x and More important, the stack is now corrupt v1=Pop() SP = 5 newSP = 4 data=X Pop Pop () { loop SP = SM newSP = SP -1 data = Stack Data CAS(&SM, SP, newSP) break return data Push Push (d) { loop SP = SM newSP = SP +1 Stack Data = d CAS(&SM, SP, newSP) break v2=Pop() SP = 5 newSP = 4 data=X CAS(&SM,SP,newSP) v2 = x

ABA Solution Add a modification counter to the pointer - atomic update of pointer and counter - can use double word CAS - Not available on most platforms - Pack pointer and counter into single word - Use single word CAS to update - Authors solution

Correctness Properties 1. The linked list is always connected 2. Nodes are only inserted after the last node in the linked list. 3. Nodes are only deleted from the beginning of the linked list 4. Head always points to the first node in the linked list 5. Tail always points to a node in the linked list.

Algorithm – Non blocking, Globals structure pointer_t {ptr: pointer to node t, count: unsigned integer} structure node_t {value: data type, next: pointer_t} structure queue_t {Head: pointer_t, Tail: pointer_t} ptr count HeadTail ptr count ptr count value next ptr count (packed Word)

Algorithm – Non Blocking Intialization initialize(Q: pointer to queue_t) node = new node() node–>next.ptr = NULL Q–>Head = Q–>Tail = node Q value next ptr count HeadTail ptr count ptr count node O

Algorithm – Non blocking Enqueue enqueue(Q: pointer to queue t, value: data type) node = new node() node–>value = value node–>next.ptr = NULL loop tail = Q–>Tail next = tail.ptr–>next if tail == Q–>Tail if next.ptr == NULL if CAS(&tail.ptr–>next, next, ) break endif else CAS(&Q–>Tail, tail, ) endif endloop CAS(&Q–>Tail, tail, ) Q value next ptr count HeadTail ptr count ptr count tail value next ptr count O node ptr count next ptr count O O

Algorithm – Non blocking Enqueue enqueue(Q: pointer to queue t, value: data type) node = new node() node–>value = value node–>next.ptr = NULL loop tail = Q–>Tail next = tail.ptr–>next if tail == Q–>Tail if next.ptr == NULL if CAS(&tail.ptr–>next, next, ) break endif else CAS(&Q–>Tail, tail, ) endif endloop CAS(&Q–>Tail, tail, ) Q value next ptr count+1 HeadTail ptr count ptr count value next ptr count O tail node ptr count next ptr count O

Algorithm – Non blocking Enqueue enqueue(Q: pointer to queue t, value: data type) node = new node() node–>value = value node–>next.ptr = NULL loop tail = Q–>Tail next = tail.ptr–>next if tail == Q–>Tail if next.ptr == NULL if CAS(&tail.ptr–>next, next, ) break endif else CAS(&Q–>Tail, tail, ) endif endloop CAS(&Q–>Tail, tail, ) Q value next ptr count HeadTail ptr count ptr count+1 value next ptr count O tail node ptr count next ptr count O

Algorithm – Non blocking Enqueue Two Process enqueue(Q: pointer to queue t, value: data type) node = new node() node–>value = value node–>next.ptr = NULL loop tail = Q–>Tail next = tail.ptr–>next if tail == Q–>Tail if next.ptr == NULL if CAS(&tail.ptr–>next, next, ) break endif else CAS(&Q–>Tail, tail, ) endif endloop CAS(&Q–>Tail, tail, ) Q value next ptr count HeadTail ptr count ptr count value next ptr count O tail ptr count next ptr count value next ptr count O node

Algorithm – Non blocking Enqueue Two Process enqueue(Q: pointer to queue t, value: data type) node = new node() node–>value = value node–>next.ptr = NULL loop tail = Q–>Tail next = tail.ptr–>next if tail == Q–>Tail if next.ptr == NULL if CAS(&tail.ptr–>next, next, ) break endif else CAS(&Q–>Tail, tail, ) endif endloop CAS(&Q–>Tail, tail, ) Q value next ptr count HeadTail ptr count ptr count value next ptr count O tail ptr count next ptr count value next ptr count O node

Algorithm – Non blocking Dequeue dequeue(Q: pointer to queue t, pvalue: pointer to data type): boolean loop head = Q–>Head tail = Q–>Tail next = head–>next if head == Q–>Head if head.ptr == tail.ptr if next.ptr == NULL return FALSE endif Q value next ptr count HeadTail ptr count ptr count ptr count ptr count ptr count head tail next O O

Algorithm – Non blocking Dequeue dequeue(Q: pointer to queue t, pvalue: pointer to data type): boolean loop head = Q–>Head tail = Q–>Tail next = head–>next if head == Q–>Head if head.ptr == tail.ptr if next.ptr == NULL return FALSE endif CAS(&Q–>Tail, tail, ) else *pvalue = next.ptr–>value if CAS(&Q–>Head, head, ) break endif endloop free(head.ptr) return TRUE Q value next ptr count HeadTail ptr count ptr count ptr count ptr count ptr count head tail next value next ptr count O

Algorithm – Non blocking Dequeue dequeue(Q: pointer to queue t, pvalue: pointer to data type): boolean loop head = Q–>Head tail = Q–>Tail next = head–>next if head == Q–>Head if head.ptr == tail.ptr if next.ptr == NULL return FALSE endif CAS(&Q–>Tail, tail, ) else *pvalue = next.ptr–>value if CAS(&Q–>Head, head, ) break endif endloop free(head.ptr) return TRUE Q value next ptr count HeadTail ptr count ptr count ptr count ptr cout ptr count head tail next value next ptr count O

Algorithm – Non blocking Dequeue dequeue(Q: pointer to queue t, pvalue: pointer to data type): boolean loop head = Q–>Head tail = Q–>Tail next = head–>next if head == Q–>Head if head.ptr == tail.ptr if next.ptr == NULL return FALSE endif CAS(&Q–>Tail, tail, ) else *pvalue = next.ptr–>value if CAS(&Q–>Head, head, ) break endif endloop free(head.ptr) return TRUE Q value next ptr count HeadTail ptr count ptr count ptr count next count ptr count head tail next value next ptr count O

Algorithm – Non blocking Dequeue dequeue(Q: pointer to queue t, pvalue: pointer to data type): boolean loop head = Q–>Head tail = Q–>Tail next = head–>next if head == Q–>Head if head.ptr == tail.ptr if next.ptr == NULL return FALSE endif CAS(&Q–>Tail, tail, ) else *pvalue = next.ptr–>value if CAS(&Q–>Head, head, ) break endif endloop free(head.ptr) return TRUE Q value next ptr count HeadTail ptr count+1 ptr count ptr count next count ptr count head tail next O value next ptr count

Algorithm – Two-Lock Concurrent Queue structure node_t {value: data type, next: pointer to node_t} structure queue_t {Head: pointer to node_t, Tail: pointer to node_t, H_lock: lock type, T_lock: lock type} value next Head Tail H_lock T_lock

Algorithm – Two-Lock Concurrent Queue value next Head Tail H_lock T_lock initialize(Q: pointer to queue_t) node = new_node() node->next = NULL Q->Head = Q->Tail = node Q->H_lock = Q->T_lock = FREE O Free

Algorithm – Two-Lock Concurrent Queue value next Head Tail H_lock T_lock O FreeLocked enqueue(Q: pointer to queue_t, value: data type) node = new_node() // Allocate a new node from the free list node->value = value// Copy enqueued value into node node->next = NULL // Set next pointer of node to NULL lock(&Q->T_lock)// Acquire T_lock in order to access Tail Q->Tail->next = node// Link node at the end of the linked list Q->Tail = node// Swing Tail to node unlock(&Q->T_lock)// Release T_lock value next O

Algorithm – Two-Lock Concurrent Queue Head Tail H_lock T_lock value next Free Locked enqueue(Q: pointer to queue_t, value: data type) node = new_node() // Allocate a new node from the free list node->value = value// Copy enqueued value into node node->next = NULL // Set next pointer of node to NULL lock(&Q->T_lock)// Acquire T_lock in order to access Tail Q->Tail->next = node// Link node at the end of the linked list Q->Tail = node// Swing Tail to node unlock(&Q->T_lock)// Release T_lock value next O

Algorithm – Two-Lock Concurrent Head Tail H_lock T_lock value next Locked value next O dequeue(Q: pointer to queue_t, pvalue: pointer to data type): boolean lock(&Q->H_lock) // Acquire H_lock in order to access Head node = Q->Head// Read Head new_head = node->next// Read next pointer if new_head == NULL// Is queue empty? unlock(&Q->H_lock)// Release H_lock before return return FALSE// Queue was empty endif *pvalue = new_head->value// Queue not empty. Read value before release Q->Head = new_head// Swing Head to next node unlock(&Q->H_lock)// Release H_lock free(node)// Free node return} TRUE node new_head

Algorithm – Two-Lock Concurrent Head Tail H_lock T_lock value next Free Locked value next O dequeue(Q: pointer to queue_t, pvalue: pointer to data type): boolean lock(&Q->H_lock) // Acquire H_lock in order to access Head node = Q->Head// Read Head new_head = node->next// Read next pointer if new_head == NULL// Is queue empty? unlock(&Q->H_lock)// Release H_lock before return return FALSE// Queue was empty endif *pvalue = new_head->value// Queue not empty. Read value before release Q->Head = new_head// Swing Head to next node unlock(&Q->H_lock)// Release H_lock free(node)// Free node return} TRUE node new_head

Performance Parameters Net execution time for one million enqueue/dequeue pairs 12-processor Silicon Graphics Challenge multiprocessor Algorithms compiled with using highest optimization level Including many hand optimizations

Performance – Dedicated Processor

Performance – Two processes Per Processor

Performance – Three Processes Per Processor

Conclusion NBS clear winner for multiprocessor multiprogrammed systems Above 5 processors, use the new non-blocking queue If hardware only supports test-and-set use two lock queue For two or less processors use a single lock algorithm for queues