3 Synchronization Shared data structures needs synchronization Synchronization using LocksMutually exclusive access to whole or parts of the data structureP1P2P3P1P2P3
4 Blocking Synchronization DrawbacksBlockingPriority InversionRisk of deadlockLocks: Semaphores, spinning, disabling interrupts etc.Reduced efficiency because of reduced parallelism
5 Non-blocking Synchronization Lock-Free SynchronizationOptimistic approach (i.e. assumes no interference)The operation is prepared to later take effect (unless interfered) using hardware atomic primitivesPossible interference is detected via the atomic primitives, and causes a retryCan cause starvationWait-Free SynchronizationAlways finishes in a finite number of its own steps.
6 Dictionaries (Sets) Fundamental data structure Works on a set of <key,value> pairsThree basic operations:Insert(k,v): Adds a new itemv=FindKey(k): Finds the item <k,v>v=DeleteKey(k): Finds and removes the item <k,v>
7 Previous Non-blocking Dictionaries M. Michael: “High Performance Dynamic Lock-Free Hash Tables and List-Based Sets”, SPAA 2002Based on Singly-Linked ListLinear time complexity!Fast Lock-Free Memory ManagementCauses retries of concurrent search operations!Building-block of Hash TablesAssumes each branch is of length <<10.However, Hash Tables might not be uniformly distributed.
8 Randomized Algorithm: Skip Lists William Pugh: ”Skip Lists: A Probabilistic Alternative to Balanced Trees”, 1990Layers of ordered lists with different densities, achieves a tree-like behaviorTime complexity: O(log2N) – probabilistic!HeadTail…25%50%1234567
9 New Lock-Free Concurrent Skip List Define node state to depend on the insertion status at lowest level as well as a deletion flagInsert from lowest level going upwardsSet deletion flag. Delete from highest level going downwards1D2D3D4D5D6D7D321p321pD
10 Overlapping operations on shared data Insert 22Example: Insert operation - which of 2 or 3 gets inserted?Solution: Compare-And-Swap atomic primitive: CAS(p:pointer to word, old:word, new:word):boolean atomic do if *p = old then *p := new; return true; else return false;143Insert 3
11 Concurrent Insert vs. Delete operations b)124Problem: - both nodes are deleted!Solution (Harris et al): Use bit 0 of pointer to mark deletion statusa)Delete3Insertb)12*4a)c)3
12 New Lock-Free Dictionary - Techniques Summary Based on Skip ListsTreated as layers of ordered listsUses CAS atomic primitiveLock-Free memory managementIBM FreelistsReference counting (Valois+Michael&Scott)Helping schemeBack-Off strategyAll together proved to be linearizable
13 ExperimentsExperiment with 1-30 threads performed on systems with 2 respective 64 cpu’s.Each thread performs operations, whereof the first total operations are Insert’s, remaining are equally randomly distributed over Insert, FindKey and DeleteKey’s.Fixed Skiplist maximum level of 10.Compare with implementation by Michael, using same scenarios.Averaged execution time of 50 experiments.
16 ConclusionsOur lock-free implementation also includes the value-oriented operations FindValue and DeleteValue.Our lock-free algorithm is suitable for both pre-emptive as well as systems with full concurrencyWill be available as part of NOBLE software library,See Technical Report for full details,
17 Questions?Contact Information:Address: Håkan Sundell vs. Philippas Tsigas Computing Science Chalmers University of Technology<phs , cs.chalmers.seWeb:
18 Dynamic Memory Management Problem: System memory allocation functionality is blocking!Solution (lock-free), IBM freelists:Pre-allocate a number of nodes, link them into a dynamic stack structure, and allocate/reclaim using CASAllocateHeadMem 1Mem 2…Mem nReclaimUsed 1
19 The ABA problemProblem: Because of concurrency (pre-emption in particular), same pointer value does not always mean same node (i.e. CAS succeeds)!!!Step 1:1674Step 2:2374
20 The ABA problemSolution: (Valois et al) Add reference counting to each node, in order to prevent nodes that are of interest to some thread to be reclaimed until all threads have left the node1*6*New Step 2:11CAS Failes!237???41
21 Helping Scheme Threads need to traverse safely 1 2 * 4 1 2 * 4 1 2 * 4 Need to remove marked-to-be-deleted nodes while traversing – Help!Finds previous node, finish deletion and continues traversing from previous nodeor12*412*4??12*4
22 Back-Off StrategyFor pre-emptive systems, helping is necessary for efficiency and lock-freenessFor really concurrent systems, overlapping CAS operations (caused by helping and others) on the same node can cause heavy contentionSolution: For every failed CAS attempt, back-off (i.e. sleep) for a certain duration, which increases exponentially
23 Non-blocking Synchronization Lock-Free SynchronizationAvoids problems with locksSimple algorithmsFast when having low contentionWait-Free SynchronizationAlways finishes in a finite number of its own steps.Complex algorithmsMemory consumingLess efficient in average than lock-free
26 The algorithm in more detail Insert:Create node with random heightSearch position (Remember drops)Insert or update on level 1Insert on level 2 to top (unless already deleted)If already deleted then HelpDelete(1)All of this while keeping track of references, help deleted nodes etc.
27 The algorithm in more detail DeleteKeySearch position (Remember drops)Mark node at level 1 as deleted, otherwise failMark next pointers on level 1 to topDelete on level top to 1 while detecting helping, indicate successFree nodeAll of this while keeping track of references, help deleted nodes etc.
28 The algorithm in more detail HelpDelete(level)Mark next pointer at level to topFind previous node (info in node)Delete on level unless already helped, indicate successReturn previous nodeAll of this while keeping track of references, help deleted nodes etc.
29 Correctness Linearizability (Herlihy 1991) In order for an implementation to be linearizable, for every concurrent execution, there should exist an equal sequential execution that respects the partial order of the operations in the concurrent execution
30 Correctness Define precise sequential semantics Define abstract state and its interpretationShow that state is atomically updatedDefine linearizability pointsShow that operations take effect atomically at these points with respect to sequential semanticsCreates a total order using the linearizability points that respects the partial orderThe algorithm is linearizable
31 Correctness Lock-freeness At least one operation should always make progressThere are no cyclic loop depencies, and all potentially unbounded loops are ”gate-keeped” by CAS operationsThe CAS operation guarantees that at least one CAS will always succeedThe algorithm is lock-free