Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Similar presentations


Presentation on theme: "Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit."— Presentation transcript:

1 Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

2 Art of Multiprocessor Programming2 Kinds of Architectures SISD (Uniprocessor) –Single instruction stream –Single data stream SIMD (Vector) –Single instruction –Multiple data MIMD (Multiprocessors) –Multiple instruction –Multiple data.

3 Art of Multiprocessor Programming3 Kinds of Architectures SISD (Uniprocessor) –Single instruction stream –Single data stream SIMD (Vector) –Single instruction –Multiple data MIMD (Multiprocessors) –Multiple instruction –Multiple data. Our space (1)

4 Art of Multiprocessor Programming4 MIMD Architectures Memory Contention Communication Contention Communication Latency Shared Bus memory Distributed

5 Art of Multiprocessor Programming5 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor (1)

6 Art of Multiprocessor Programming6 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor our focus

7 Art of Multiprocessor Programming7 Basic Spin-Lock CS Resets lock upon exit spin lock critical section...

8 Art of Multiprocessor Programming8 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock introduces sequential bottleneck

9 Art of Multiprocessor Programming9 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock suffers from contention

10 Art of Multiprocessor Programming10 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock suffers from contention Seq Bottleneck  no parallelism

11 Art of Multiprocessor Programming11 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... Contention  ??? …lock suffers from contention

12 Art of Multiprocessor Programming12 Test-and-Set Boolean value Test-and-set (TAS) –Swap true with current value –Return value tells if prior value was true or false Can reset just by writing false TAS aka “getAndSet”

13 Art of Multiprocessor Programming13 Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } (5)

14 Art of Multiprocessor Programming14 Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Package java.util.concurrent.atomic

15 Art of Multiprocessor Programming15 Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Swap old and new values

16 Art of Multiprocessor Programming16 Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true)

17 Art of Multiprocessor Programming17 Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true) (5) Swapping in true is called “test-and-set” or TAS

18 Art of Multiprocessor Programming18 Test-and-Set Locks Locking –Lock is free: value is false –Lock is taken: value is true Acquire lock by calling TAS –If result is false, you win –If result is true, you lose Release lock by writing false

19 Art of Multiprocessor Programming19 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

20 Art of Multiprocessor Programming20 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Lock state is AtomicBoolean

21 Art of Multiprocessor Programming21 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Keep trying until lock acquired

22 Art of Multiprocessor Programming22 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Release lock by resetting state to false

23 Art of Multiprocessor Programming23 Performance Experiment –n threads –Increment shared counter 1 million times How long should it take? How long does it take?

24 Art of Multiprocessor Programming24 Graph ideal time threads no speedup because of sequential bottleneck

25 Art of Multiprocessor Programming25 Mystery #1 time threads TAS lock Ideal (1) What is going on?

26 Art of Multiprocessor Programming26 Bus-Based Architectures Bus cache memory cache

27 Art of Multiprocessor Programming27 Bus-Based Architectures Bus cache memory cache Random access memory (10s of cycles)

28 Art of Multiprocessor Programming28 Bus-Based Architectures cache memory cache Shared Bus Broadcast medium One broadcaster at a time Processors and memory all “snoop” Bus

29 Art of Multiprocessor Programming29 Bus-Based Architectures Bus cache memory cache Per-Processor Caches Small Fast: 1 or 2 cycles Address & state information

30 Art of Multiprocessor Programming30 Jargon Watch Cache hit –“I found what I wanted in my cache” –Good Thing™

31 Art of Multiprocessor Programming31 Jargon Watch Cache hit –“I found what I wanted in my cache” –Good Thing™ Cache miss –“I had to shlep all the way to memory for that data” –Bad Thing™

32 Art of Multiprocessor Programming32 Bus Processor Issues Load Request cache memory cache data

33 Art of Multiprocessor Programming33 Bus Processor Issues Load Request Bus cache memory cache data Gimme data

34 Art of Multiprocessor Programming34 cache Bus Memory Responds Bus memory cache data Got your data right here data

35 Art of Multiprocessor Programming35 Bus Processor Issues Load Request memory cache data Gimme data

36 Art of Multiprocessor Programming36 Bus Processor Issues Load Request Bus memory cache data Gimme data

37 Art of Multiprocessor Programming37 Bus Processor Issues Load Request Bus memory cache data I got data

38 Art of Multiprocessor Programming38 Bus Other Processor Responds memory cache data I got data data Bus

39 Art of Multiprocessor Programming39 Bus Other Processor Responds memory cache data Bus

40 Art of Multiprocessor Programming40 Modify Cached Data Bus data memory cachedata (1)

41 Art of Multiprocessor Programming41 Modify Cached Data Bus data memory cachedata (1)

42 Art of Multiprocessor Programming42 memory Bus data Modify Cached Data cachedata

43 Art of Multiprocessor Programming43 memory Bus data Modify Cached Data cache What’s up with the other copies? data

44 Art of Multiprocessor Programming44 Cache Coherence We have lots of copies of data –Original copy in memory –Cached copies at processors Some processor modifies its own copy –What do we do with the others? –How to avoid confusion?

45 Art of Multiprocessor Programming45 Write-Back Caches Accumulate changes in cache Write back when needed –Need the cache for something else –Another processor wants it On first modification –Invalidate other entries –Requires non-trivial protocol …

46 Art of Multiprocessor Programming46 Write-Back Caches Cache entry has three states –Invalid: contains raw seething bits –Valid: I can read but I can’t write –Dirty: Data has been modified Intercept other load requests Write back to memory before using cache

47 Art of Multiprocessor Programming47 Bus Invalidate memory cachedata

48 Art of Multiprocessor Programming48 Bus Invalidate Bus memory cachedata Mine, all mine!

49 Art of Multiprocessor Programming49 Bus Invalidate Bus memory cachedata cache Uh,oh

50 Art of Multiprocessor Programming50 cache Bus Invalidate memory cachedata Other caches lose read permission

51 Art of Multiprocessor Programming51 cache Bus Invalidate memory cachedata Other caches lose read permission This cache acquires write permission

52 Art of Multiprocessor Programming52 cache Bus Invalidate memory cachedata Memory provides data only if not present in any cache, so no need to change it now (expensive) (2)

53 Art of Multiprocessor Programming53 cache Bus Another Processor Asks for Data memory cachedata (2) Bus

54 Art of Multiprocessor Programming54 cache data Bus Owner Responds memory cachedata (2) Bus Here it is!

55 Art of Multiprocessor Programming55 Bus End of the Day … memory cachedata (1) Reading OK, no writing data

56 Art of Multiprocessor Programming56 Mutual Exclusion What do we want to optimize? –Bus bandwidth used by spinning threads –Release/Acquire latency –Acquire latency for idle lock

57 Art of Multiprocessor Programming57 Simple TASLock TAS invalidates cache lines Spinners –Miss in cache –Go to bus

58 Art of Multiprocessor Programming58 NUMA Architecturs Acronym: –Non-Uniform Memory Architecture Illusion: –Flat shared memory Truth: –No caches (sometimes) –Some memory regions faster than others

59 Art of Multiprocessor Programming59 NUMA Machines Spinning on local memory is fast

60 Art of Multiprocessor Programming60 NUMA Machines Spinning on remote memory is slow


Download ppt "Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit."

Similar presentations


Ads by Google