Presentation is loading. Please wait.

Presentation is loading. Please wait.

Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Similar presentations


Presentation on theme: "Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit."— Presentation transcript:

1 Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

2 Art of Multiprocessor Programming2 Memory Models Memory Contention Communication Contention Communication Latency Cache Coherent (CC) memory Distributed Shared Memory (DSM)

3 Art of Multiprocessor Programming3 Today: Revisit Mutual Exclusion Think of performance, not just correctness and progress Begin to understand how performance depends on our software properly utilizing the multiprocessor machine’s hardware

4 Remote Access Remote access is expensive! Allow spinning only on local variables: –DSM: spin only on variables in the local memory –CC: spin only on variables in cache 4

5 Art of Multiprocessor Programming5 Basic Spin-Lock CS Resets lock upon exit spin lock critical section...

6 Art of Multiprocessor Programming6 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock suffers from contention – no local spinning!

7 Art of Multiprocessor Programming7 Idea Avoid useless invalidations –By keeping a queue of threads Each thread –Notifies next in line –Without bothering the others

8 Art of Multiprocessor Programming8 Anderson Queue Lock flags next TFFFFFFF acquired acquiring getAndIncrement

9 Art of Multiprocessor Programming9 Anderson Queue Lock Good –Local spinning (CC model) –Simple, easy to implement Bad –One bit per thread Unknown number of threads? Small number of actual contenders?

10 Art of Multiprocessor Programming10 CLH Lock FIFO order Small, constant-size overhead per thread

11 Art of Multiprocessor Programming11 Initially false tail idle

12 Art of Multiprocessor Programming12 Green Wants the Lock false tail acquiring

13 Art of Multiprocessor Programming13 Green Wants the Lock false tail acquiring true

14 Art of Multiprocessor Programming14 Green Wants the Lock false tail acquiring true Swap

15 Art of Multiprocessor Programming15 Green Has the Lock false tail acquired true

16 Art of Multiprocessor Programming16 Blue Wants the Lock false tail acquired acquiring true

17 Art of Multiprocessor Programming17 Blue Wants the Lock false tail acquired acquiring true Swap true

18 Art of Multiprocessor Programming18 Blue Wants the Lock false tail acquired acquiring true

19 Art of Multiprocessor Programming19 Blue Wants the Lock false tail acquired acquiring true

20 Art of Multiprocessor Programming20 Blue Wants the Lock false tail acquired acquiring true Implicitely Linked list

21 Art of Multiprocessor Programming21 Blue Wants the Lock false tail acquired acquiring true

22 Art of Multiprocessor Programming22 Blue Wants the Lock false tail acquired acquiring true Actually, it spins on cached copy

23 Art of Multiprocessor Programming23 Green Releases false tail release acquiring false true false Bingo!

24 Art of Multiprocessor Programming24 Green Releases tail released acquired true

25 CLH Queue Lock Entry section Exit section Art of Multiprocessor Programming25 new myNode myNode := true do myPred := tail while !CAS(tail,myPred,myNode) wait until !myPred new myNode myNode := true do myPred := tail while !CAS(tail,myPred,myNode) wait until !myPred myNode := false

26 Art of Multiprocessor Programming26 CLH Lock Good –Lock release affects predecessor only –Small, constant-sized space Bad –Not local spinning for DSM model

27 Art of Multiprocessor Programming27 CLH Lock Each thread spin’s on predecessor’s memory Could be far away …

28 Art of Multiprocessor Programming28 MCS Lock FIFO order Spin on local memory only Small, Constant-size overhead

29 Art of Multiprocessor Programming29 Initially tail false idle

30 Art of Multiprocessor Programming30 Acquiring false true acquiring (allocate Qnode) tail

31 Art of Multiprocessor Programming31 Acquiring tail true swap false acquiring

32 Art of Multiprocessor Programming32 Acquiring tail true false acquiring

33 Art of Multiprocessor Programming33 Acquired tail true acquired false

34 Art of Multiprocessor Programming34 Acquiring tail false acquired acquiring true swap

35 Art of Multiprocessor Programming35 Acquiring tail acquired acquiring true false

36 Art of Multiprocessor Programming36 Acquiring tail acquired acquiring true false

37 Art of Multiprocessor Programming37 Acquiring tail acquired acquiring true false

38 Art of Multiprocessor Programming38 Acquiring tail acquired acquiring true false

39 Art of Multiprocessor Programming39 Acquiring tail acquired acquiring true Yes!

40 MCS Queue Lock Entry section Exit section Art of Multiprocessor Programming40 new myNode do myPred := tail while !CAS(tail,myPred,myNode) If myPred!=null myNode.locked:= true myPred.next:= myNode wait until !(myPred.locked) new myNode do myPred := tail while !CAS(tail,myPred,myNode) If myPred!=null myNode.locked:= true myPred.next:= myNode wait until !(myPred.locked) If myNode.next == null if CAS(tail,myNode,null)then return wait until myNode.next!=null myNode.next.locked := false If myNode.next == null if CAS(tail,myNode,null)then return wait until myNode.next!=null myNode.next.locked := false

41 Art of Multiprocessor Programming41 Green Release false releasing swap false

42 Art of Multiprocessor Programming42 Green Release false releasing swap false By looking at the queue, I see another thread is active

43 Art of Multiprocessor Programming43 Green Release false releasing swap false By looking at the queue, I see another thread is active I have to wait for that thread to finish

44 Art of Multiprocessor Programming44 Green Release false releasing prepare to spin true

45 Art of Multiprocessor Programming45 Green Release false releasing spinning true

46 Art of Multiprocessor Programming46 Green Release false releasing spinning false

47 Art of Multiprocessor Programming47 Green Release false releasing Acquired lock false

48 Non-Uniform Memory Architecture (NUMA) 48 memory

49 Non-Uniform Memory Architecture (NUMA) Today, many large scale modern multiprocessors are NUMA: –Clusters of processors with shared local memory –Access by a processor to the memory of its cluster two or more times faster than remote memory –Per cluster cache 49

50 Lock Bouncing 50 memory

51 Hierarchical Locks Encourage threads with high mutual memory locality to acquire the lock consecutively Reduce overall cache misses 51

52 Hierarchical CLH (HCLH) Lock Local queue per cluster Global queue to enter the critical section A local queue is added to the global queue with a single CAS 52 [Luchangco, Nussbaum and Shavit 2006]

53 HCLH Lock First, add the thread to the local queue If a thread is the first in the local queue, it is responsible for merging into the global queue 53

54 HCLH Lock 54 false Local tail acquiring

55 HCLH Lock 55 false Local tail acquiring cidtruefalse Successor_must_wait Tail_when_merged

56 HCLH Lock 56 false Local tail acquiring cid Swap truefalse Successor_must_wait Tail_when_merged

57 HCLH Lock 57 false Local tail cidtruefalse acquiring

58 HCLH Lock 58 false Local tail cidtruefalse acquiring cidtruefalse acquiring

59 HCLH Lock 59 false Local tail cidtruefalse acquiring cidtruefalse Swap acquiring

60 HCLH Lock 60 false Local tail cidtruefalse acquiring cidtruefalse acquiring

61 HCLH Lock 61 false Local tail cidtruefalse acquiring cidtruefalse acquiring

62 HCLH Lock 62 false Local tail cidtruefalsecidtruefalse

63 HCLH Lock 63 false Local tail cidtruefalse cidtruefalse cidtruefalse cidtrueTRUE Global tail Cluster master: sees lock is held, so waits a “combining delay”

64 HCLH Lock 64 Local tail cidtruefalse cidtruefalse cidtruefalse cidtrueTRUE Global tail Cluster master: sees lock is held, so waits a “combining delay”

65 HCLH Lock 65 Local tail cidtruefalse cidtruefalse cidtruefalse cidtrueTRUE SWAP Global tail

66 HCLH Lock 66 Local tail cidtruefalse cidtruefalse cidtruefalse cidtrueTRUE Global tail

67 HCLH Lock 67 Local tail cidtruefalse cidtrueTRUE false cidtruefalse cidtrueTRUE Global tail

68 References Spin, Anderson, CLH, MCS Locks: “The Art of Multiprocessor Programming”, Herlihy and Shavit, Chapter 7. HCLH Lock: “A Hierarchical CLH Queue Lock”, Luchangco, Nussbaum and Shavit, Euro-Par 2006. 68


Download ppt "Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit."

Similar presentations


Ads by Google