Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transactional Memory The Art of Multiprocessor Programming Herlihy and Shavit.

Similar presentations


Presentation on theme: "Transactional Memory The Art of Multiprocessor Programming Herlihy and Shavit."— Presentation transcript:

1 Transactional Memory The Art of Multiprocessor Programming Herlihy and Shavit

2 From the New York Times … SAN FRANCISCO, May 7. 2004 - Intel said on Friday that it was scrapping its development of two microprocessors, a move that is a shift in the company's business strategy….

3 Moore’s Law (hat tip: Herb Sutter) Clock speed flattening sharply Transistor count still rising

4 Art of Multiprocessor Programming 4 Multicore Architetures “Learn how the multi-core processor architecture plays a central role in Intel's platform approach. ….” “AMD is leading the industry to multi-core technology for the x86 based computing market …” “Sun's multicore strategy centers around multi-threaded software.... “

5 © 2006 Herlihy & Shavit5 Why do we care? Time no longer cures software bloat When you double your path length –You can’t just wait 6 months –Your software must somehow exploit twice as much concurrency

6 © 2006 Herlihy & Shavit6 The Problem Cannot exploit cheap threads Today’s Software –Non-scalable methodologies Today’s Hardware –Poor support for scalable synchronization

7 © 2006 Herlihy & Shavit7 Locking

8 © 2006 Herlihy & Shavit8 Coarse-Grained Locking Easily made correct … But not scalable.

9 © 2006 Herlihy & Shavit9 Fine-Grained Locking Here comes trouble …

10 © 2006 Herlihy & Shavit10 Why Locking Doesn’t Scale Not Robust Relies on conventions Hard to Use –Conservative –Deadlocks –Lost wake-ups Not Composable

11 © 2006 Herlihy & Shavit11 Locks are not Robust If a thread holding a lock is delayed … No one else can make progress

12 © 2006 Herlihy & Shavit12 Why Locking Doesn’t Scale Not Robust Relies on conventions Hard to Use –Conservative –Deadlocks –Lost wake-ups Not Composable

13 © 2006 Herlihy & Shavit13 Locking Relies on Conventions Relation between –Lock bit and object bits –Exists only in programmer’s mind /* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder,mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */ Actual comment from Linux Kernel (hat tip: Bradley Kuszmaul)

14 © 2006 Herlihy & Shavit14 Why Locking Doesn’t Scale Not Robust Relies on conventions Hard to Use –Conservative –Deadlocks –Lost wake-ups Not Composable

15 © 2006 Herlihy & Shavit15 Sadistic Homework enq(x) enq(y) Double-ended queue No interference if ends “far enough” apart

16 © 2006 Herlihy & Shavit16 Sadistic Homework deq() ceq() Double-ended queue Interference OK if ends “close enough” together

17 © 2006 Herlihy & Shavit17 Sadistic Homework deq() Double-ended queue Make sure suspended dequeuers awake as needed 

18 In Search of the Lost Wake-Up Waiting thread doesn’t realize when to wake up It’s a real problem in big systems –“Calling pthread_cond_signal() or pthread_cond_broadcast() when the thread does not hold the mutex lock associated with the condition can lead to lost wake-up bugs.” from Google™ search for “lost wake-up”

19 © 2006 Herlihy & Shavit19 You Try It … One lock? –Too Conservative Locks at each end? –Deadlock, too complicated, etc Waking blocked dequeuers? –Harder than it looks

20 © 2006 Herlihy & Shavit20 Actual Solution Clean solution would be a publishable result [Michael & Scott, PODC 96] High performance fine-grained lock- based solutions are good for libraries… not general consumption by programmers

21 © 2006 Herlihy & Shavit21 Why Locking Doesn’t Scale Not Robust Relies on conventions Hard to Use –Conservative –Deadlocks –Lost wake-ups Not Composable

22 © 2006 Herlihy & Shavit22 Locks do not compose add(T 1, item) delete(T 1, item) add(T 2, item) item Move from T 1 to T 2 Must lock T 2 before deleting from T 1 lock T2 lock T1 item Exposing lock internals breaks abstraction Hash Table Must lock T 1 before adding item

23 © 2006 Herlihy & Shavit23 Monitor Wait and Signal zzz If buffer is empty, wait for item to show up Empty buffer Yes!

24 © 2006 Herlihy & Shavit24 Wait and Signal do not Compose empty zzz… Wait for either?

25 © 2006 Herlihy & Shavit25 The Transactional Manifesto What we do now is inadequate to meet the multicore challenge Research Agenda –Replace locking with a transactional API –Design languages to support this model –Implement the run-time to be fast enough

26 © 2006 Herlihy & Shavit26 Transactions Atomic –Commit: takes effect –Abort: effects rolled back Usually retried Linearizable –Appear to happen in one-at-a-time order

27 © 2006 Herlihy & Shavit27 atomic { x.remove(3); y.add(3); } atomic { y = null; } Atomic Blocks

28 © 2006 Herlihy & Shavit28 atomic { x.remove(3); y.add(3); } atomic { y = null; } Atomic Blocks No data race

29 © 2006 Herlihy & Shavit29 Public void LeftEnq(item x) { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Sadistic Homework Revisited (1) Write sequential Code

30 © 2006 Herlihy & Shavit30 Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Sadistic Homework Revisited (1)

31 © 2006 Herlihy & Shavit31 Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Sadistic Homework Revisited (1) Enclose in atomic block

32 © 2006 Herlihy & Shavit32 Warning Not always this simple –Conditional waits –Enhanced concurrency –Overlapping locks But often it is –Works for sadistic homework

33 © 2006 Herlihy & Shavit33 Public void Transfer(Queue q1, q2) { atomic { T x = q1.deq(); q2.enq(x); } Composition (1) Trivial or what?

34 © 2006 Herlihy & Shavit34 Public T LeftDeq() { atomic { if (this.left == null) retry; … } Wake-ups: lost and found (1) Roll back transaction and restart when something changes

35 © 2006 Herlihy & Shavit35 OrElse Composition atomic { x = q1.deq(); } orElse { x = q2.deq(); } Run 1 st method. If it retries … Run 2 nd method. If it retries … Entire statement retries

36 Transactional Memory Software transactional memory (STM) Hardware transactional memory (HTM) Hybrid transactional memory (HyTM, try in hardware and default to software if unsuccessful) © 2006 Herlihy & Shavit36

37 © 2006 Herlihy & Shavit37 Hardware versus Software Do we need hardware at all? –Analogies: Virtual memory: yes! Garbage collection: no! –Probably do need HW for performance Do we need software? –Policy issues don’t make sense for hardware

38 Design Issues Implementation choices Language design issues Semantic issues

39 Granularity Object –managed languages, Java, C#, … –Easy to control interactions between transactional & non-trans threads Word –C, C++, … –Hard to control interactions between transactional & non-trans threads

40 Direct/Deferred Update Deferred –modify private copies & install on commit –Commit requires work –Consistency easier Direct –Modify in place, roll back on abort –Makes commit efficient –Consistency harder

41 Conflict Detection Eager –Detect before conflict arises –“Contention manager” module resolves Lazy –Detect on commit/abort Mixed –Eager write/write, lazy read/write …

42 Conflict Detection Eager detection may abort transaction that could have committed. Lazy detection discards more computation.

43 Contention Managers Oracle that resolves conflicts –For eager conflict detection CM decides –Whether to abort other transaction –Or give it a chance to finish …

44 Contention Manager Strategies Exponential backoff Oldest gets priority Most work done gets priority Non-waiting has priority over waiting Lots of alternatives –None seems to dominate –But choice can have big impact

45 I/O & System Calls? Some I/O revocable –Provide transaction-safe libraries –Undoable file system/DB calls Some not –Opening cash drawer –Firing missile

46 I/O & System Calls One solution: make transaction irrevocable –If transaction tries I/O, switch to irrevocable mode. There can be only one … –Requires serial execution No explicit aborts –In irrevocable transactions

47 Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); }

48 Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } Throws OutOfMemoryException!

49 Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } Throws OutOfMemoryException! What is printed?

50 Unhandled Exceptions Aborts transaction –Preserves invariants –Safer Commits transaction –Like locking semantics –What if exception object refers to values modified in transaction?

51 Nested Transactions atomic void foo() { bar(); } atomic void bar() { … } atomic void foo() { bar(); } atomic void bar() { … }

52 Nested Transactions Needed for modularity –Who knew that cosine() contained a transaction? Flat nesting –If child aborts, so does parent First-class nesting –If child aborts, partial rollback of child only

53 Open Nested Transactions Normally, child commit –Visible only to parent In open nested transactions –Commit visible to all –Escape mechanism –Dangerous, but useful What escape mechanisms are needed?

54 Privatization Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times

55 Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times Privatization “privatize” node

56 Privatization Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times Thread 0 Access without transactional synchronization

57 Transactions Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times atomic { Node n = head; while (n != null) { n.val++; n = n.next; } atomic { Node n = head; while (n != null) { n.val++; n = n.next; } Thread 0 Thread 1

58 Transactions Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times atomic { Node n = head; while (n != null) { n.val++; n = n.next; } atomic { Node n = head; while (n != null) { n.val++; n = n.next; } Thread 0 Thread 1 Is this safe?

59 Transactions Thread 0 Thread 1 Is this safe? For locks, yes. Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times atomic { Node n = head; while (n != null) { n.val++; n = n.next; } atomic { Node n = head; while (n != null) { n.val++; n = n.next; }

60 Transactions Thread 0 Thread 1 Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times Node n; atomic { n = head; if (n != null) head = head.next; } // use n.val many times atomic { Node n = head; while (n != null) { n.val++; n = n.next; } atomic { Node n = head; while (n != null) { n.val++; n = n.next; } Is this safe? For locks, yes. For STMs, maybe not.

61 Strong vs Weak Isolation Non-transactional accesses don’t synchronize Non-transactional thread may see delayed updates/recovery Solutions –Privatization barrier? –Other techniques?

62 Single Global Lock Semantics Transactions act as if it acquires SGL Good: –Intuitively appealing Bad: –What about aborted transactions? –May be expensive to implement

63 Simple Lock-Based STM STMs come in different forms –Lock-based –Lock-free Here we will describe a simple lock- based STM

64 Synchronization Transaction keeps –Read set: locations & values read –Write set: locations & values to be written Deferred update –Changes installed at commit Lazy conflict detection –Conflicts detected at commit

65 © 2006 Herlihy & Shavit65 STM: Transactional Locking Map Array of Versioned- Write-Locks Application Memory V#

66 © 2006 Herlihy & Shavit66 Reading an Object Put V#s & value in RS If not already locked Mem Locks V#

67 © 2006 Herlihy & Shavit67 To Write an Object Add V# and new value to WS Mem Locks V#

68 © 2006 Herlihy & Shavit68 To Commit Acquire W locks Check V#s unchanged In RS & WS Install new values Increment V#s Release … Mem Locks V# X Y V#+1

69 Transactional Consistency Application Memory x y 4 2 8 4 Invariant x = 2y Transaction A: Write x Write y Transaction B: Read x Read y Compute z = 1/(x-y) = 1/2

70 Zombies! A transaction that will abort because of a synchronization conflict But doesn’t know it yet And keeps running

71 Zombies! Application Memory x y 4 2 8 4 Invariant x = 2y Transaction A: Write x (kills B) Write y Transaction B: Read x = 4 Transaction B: (zombie) Read y = 4 Compute z = 1/(x-y) DIV by 0 ERROR

72 Solution: The “Global Clock” Have one shared global clock Incremented by (small subset of) writing transactions Read by all transactions Used to validate that state worked on is always consistent

73 © 2006 Herlihy & Shavit73 Read-Only Transactions Copy V Clock to RV Read lock,V# Read mem Check unlocked Recheck V# unchanged Check V# < RV Mem Locks 12 32 56 19 17 100 Shared Version Clock Private Read Version 100 Reads form a snapshot of memory. No read set!

74 © 2006 Herlihy & Shavit74 Regular Transactions Copy V Clock to RV On read/write, check: Unlocked V# ≤ RV Add to R/W set Mem Locks 100 Shared Version Clock Private Read Version 100 12 32 56 19 17

75 © 2006 Herlihy & Shavit75 Regular Transactions Acquire locks WV = F&Inc(V Clock) Check each V# ≤ RV Update memory Set write V#s to WV Mem Locks 100 Shared Version Clock Private Read Version 100 101 x y 12 32 56 19 17 100

76 © 2006 Herlihy & Shavit76 Hardware Transactional Memory Exploit Cache coherence protocols Already do almost what we need –Invalidation –Consistency checking Exploit Speculative execution –Branch prediction = optimistic synch!

77 © 2006 Herlihy & Shavit77 HW Transactional Memory Interconnect caches memory read active T

78 © 2006 Herlihy & Shavit78 Transactional Memory caches memory read active TT

79 © 2006 Herlihy & Shavit79 Transactional Memory caches memory active TT committed

80 © 2006 Herlihy & Shavit80 Transactional Memory caches memory write active committed TD

81 © 2006 Herlihy & Shavit81 Rewind caches memory active TT write aborted D

82 © 2006 Herlihy & Shavit82 Transaction Commit At commit point –If no cache conflicts, we win. Mark transactional entries –Read-only: valid –Modified: dirty (eventually written back) That’s all, folks! –Except for a few details …

83 © 2006 Herlihy & Shavit83 Not all Skittles and Beer Limits to –Transactional cache size –Scheduling quantum Transaction cannot commit if it is –Too big –Too slow –Actual limits platform-dependent

84 Approaches Virtualization –Spill transactional cache to memory Hybrid –Use limited HTM as HW accelerator for STM –Sun’s Rock™ processor …

85 Back to Amdahl’s Law: Speedup = 1/(ParallelPart/N + SequentialPart) Pay for N = 8 cores SequentialPart = 25% Speedup = only 2.9 times! Must parallelize applications on a very fine grain! So…How will we make use of multicores?

86 Art of Multiprocessor Programming 86 Traditional Scaling Process User code Traditional Uniprocessor Speedup 1.8x 7x 3.6x Time: Moore’s law

87 Art of Multiprocessor Programming 87 Multicore Scaling Process User code Multicore Speedup 1.8x7x3.6x Unfortunately, not so simple…

88 Art of Multiprocessor Programming 88 Actual Scaling Process 1.8x 2x 2.9x User code Multicore Speedup Parallelization and Synchronization require great care…

89 Need Ways to Make Scaling Smoother… TM code Speedup 1.8x7x3.6x Multicore User code

90 I, for one, Welcome our new Multicore Overlords … Multicore architectures force us to rethink how we do synchronization Standard locking model won’t work Transactional model might –Software –Hardware A full-employment act!


Download ppt "Transactional Memory The Art of Multiprocessor Programming Herlihy and Shavit."

Similar presentations


Ads by Google