Presentation is loading. Please wait.

Presentation is loading. Please wait.

Caltech CS184 Spring2003 -- DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory.

Similar presentations


Presentation on theme: "Caltech CS184 Spring2003 -- DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory."— Presentation transcript:

1 Caltech CS184 Spring2003 -- DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory

2 Caltech CS184 Spring2003 -- DeHon 2 Today Shared Memory –Model –Bus-based Snooping –Cache Coherence Synchronization –Primitives –Algorithms –Performance

3 Caltech CS184 Spring2003 -- DeHon 3 Shared Memory Model Same model as multithreaded uniprocessor –Single, shared, global address space –Multiple threads (PCs) –Run in same address space –Communicate through memory Memory appear identical between threads Hidden from users (looks like memory op)

4 Caltech CS184 Spring2003 -- DeHon 4 Synchronization For correctness have to worry about synchronization –Otherwise non-deterministic behavior –Threads run asynchronously –Without additional/synchronization discipline Cannot say anything about relative timing

5 Caltech CS184 Spring2003 -- DeHon 5 Models Conceptual model: –Processor per thread –Single shared memory Programming Model: –Sequential language –Thread Package –Synchronization primitives Architecture Model: Multithreaded uniprocessor

6 Caltech CS184 Spring2003 -- DeHon 6 Conceptual Model Memory

7 Caltech CS184 Spring2003 -- DeHon 7 Architecture Model Implications Coherent view of memory –Any processor reading at time X will see same value –All writes eventually effect memory Until overwritten –Writes to memory seen in same order by all processors Sequentially Consistent Memory View

8 Caltech CS184 Spring2003 -- DeHon 8 Sequential Consistency Memory must reflect some valid sequential interleaving of the threads

9 Caltech CS184 Spring2003 -- DeHon 9 Sequential Consistency P1: A = 0 A = 1 L1: if (B==0) P2: B = 0 B = 1 L2: if (A==0) Can both conditionals be true?

10 Caltech CS184 Spring2003 -- DeHon 10 Sequential Consistency P1: A = 0 A = 1 L1: if (B==0) P2: B = 0 B = 1 L2: if (A==0) Both can be false

11 Caltech CS184 Spring2003 -- DeHon 11 Sequential Consistency P1: A = 0 A = 1 L1: if (B==0) P2: B = 0 B = 1 L2: if (A==0) If enter L1, then A must be 1  not enter L2

12 Caltech CS184 Spring2003 -- DeHon 12 Sequential Consistency P1: A = 0 A = 1 L1: if (B==0) P2: B = 0 B = 1 L2: if (A==0) If enter L2, then B must be 1  not enter L1

13 Caltech CS184 Spring2003 -- DeHon 13 Coherence Alone Coherent view of memory –Any processor reading at time X will see same value –All writes eventually effect memory Until overwritten –Writes to memory seen in same order by all processors Coherence alone does not guarantee sequential consistency

14 Caltech CS184 Spring2003 -- DeHon 14 Sequential Consistency P1: A = 0 A = 1 L1: if (B==0) P2: B = 0 B = 1 L2: if (A==0) If not force visible changes of variable, (assignments of A, B), could end up inside both.

15 Caltech CS184 Spring2003 -- DeHon 15 Consistency Deals with when written value must be seen by readers Coherence – w/ respect to same memory location Consistency – w/ respect to other memory locations …there are less strict consistency models…

16 Caltech CS184 Spring2003 -- DeHon 16 Implementation

17 Caltech CS184 Spring2003 -- DeHon 17 Naïve What’s wrong with naïve model? Memory

18 Caltech CS184 Spring2003 -- DeHon 18 What’s Wrong? Memory bandwidth –1 instruction reference per instruction –0.3 memory references per instruction –333ps cycle –N*5 Gwords/s ? Interconnect Memory access latency

19 Caltech CS184 Spring2003 -- DeHon 19 Optimizing How do we improve?

20 Caltech CS184 Spring2003 -- DeHon 20 Naïve Caching What happens when add caches to processors? Memory P$P$P$P$

21 Caltech CS184 Spring2003 -- DeHon 21 Naïve Caching Cached answers may be stale Shadow the correct value

22 Caltech CS184 Spring2003 -- DeHon 22 How have both? Keep caching –Reduces main memory bandwidth –Reduces access latency Satisfy Model

23 Caltech CS184 Spring2003 -- DeHon 23 Cache Coherence Make sure everyone sees same values Avoid having stale values in caches At end of write, all cached values should be the same

24 Caltech CS184 Spring2003 -- DeHon 24 Idea Make sure everyone sees the new value Broadcast new value to everyone who needs it –Use bus in shared-bus system Memory P$P$P$P$

25 Caltech CS184 Spring2003 -- DeHon 25 Effects Memory traffic is now just: –Cache misses –All writes

26 Caltech CS184 Spring2003 -- DeHon 26 Additional Structure? Only necessary to write/broadcast a value if someone else has it cached Can write locally if know sole owner –Reduces main memory traffic –Reduces write latency

27 Caltech CS184 Spring2003 -- DeHon 27 Idea Track usage in cache state “Snoop” on shared bus to detect changes in state Memory P$P$P$P$ RD 0300… Someone Has copy…

28 Caltech CS184 Spring2003 -- DeHon 28 Cache State Data in cache can be in one of several states –Not cached (not present) –Exclusive (not shared) Safe to write to –Shared Must share writes with others Update state with each memory op

29 Caltech CS184 Spring2003 -- DeHon 29 Cache Protocol [Culler/Singh/Gupta 5.13] RdX = Read Exclusive Perform Write by: Reading exclusive Writing locally

30 Caltech CS184 Spring2003 -- DeHon 30 Snoopy Cache Organization [Culler/Singh/Gupta 6.4]

31 Caltech CS184 Spring2003 -- DeHon 31 Cache States Extra bits in cache –Like valid, dirty

32 Caltech CS184 Spring2003 -- DeHon 32 Misses #s are cache line size [Culler/Singh/Gupta 5.23]

33 Caltech CS184 Spring2003 -- DeHon 33 Misses [Culler/Singh/Gupta 5.27]

34 Caltech CS184 Spring2003 -- DeHon 34 Synchronization

35 Caltech CS184 Spring2003 -- DeHon 35 Problem If correctness requires an ordering between threads, –have to enforce it Was not a problem we had in the single- thread case –does occur in the multiple threads on single processor case

36 Caltech CS184 Spring2003 -- DeHon 36 Desired Guarantees Precedence –barrier synchronization Everything before barrier completes before anything after begins –producer-consumer Consumer reads value produced by producer Atomic Operation Set Mutual exclusion

37 Caltech CS184 Spring2003 -- DeHon 37 Read/Write Locks? Try implement lock with r/w: if (~A.lock) A.lock=true do stuff A.lock=false

38 Caltech CS184 Spring2003 -- DeHon 38 Problem with R/W locks? Consider context switch between test (~A.lock=true?) and assignment (A.lock=true) if (~A.lock) A.lock=true do stuff A.lock=false

39 Caltech CS184 Spring2003 -- DeHon 39 Primitive Need Need Indivisible primitive to enabled atomic operations

40 Caltech CS184 Spring2003 -- DeHon 40 Original Examples Test-and-set –combine test of A.lock and set into single atomic operation –once have lock can guarantee mutual exclusion at higher level Read-Modify-Write –atomic read…write sequence Exchange

41 Caltech CS184 Spring2003 -- DeHon 41 Examples (cont.) Exchange –Exchange true with A.lock –if value retrieved was false this process got the lock –if value retrieved was true already locked (didn’t change value) keep trying –key is, only single exchanger get the false value

42 Caltech CS184 Spring2003 -- DeHon 42 Implementing... What required to implement? –Uniprocessor –Bus-based

43 Caltech CS184 Spring2003 -- DeHon 43 Implement: Uniprocessor Prevent Interrupt/context switch Primitives use single address –so page fault at beginning –then ok, to computation (defer faults…) SMT?

44 Caltech CS184 Spring2003 -- DeHon 44 Implement: Snoop Bus Need to reserve for Write –write-through hold the bus between read and write Guarantee no operation can intervene –write-back need exclusive read and way to defer other writes until written

45 Caltech CS184 Spring2003 -- DeHon 45 Performance Concerns? Locking resources reduce parallelism Bus (network) traffic Processor utilization Latency of operation

46 Caltech CS184 Spring2003 -- DeHon 46 Basic Synch. Components Acquisition Waiting Release

47 Caltech CS184 Spring2003 -- DeHon 47 Possible Problems Spin wait generates considerable memory traffic Release traffic Bottleneck on resources Invalidation –can’t cache locally… Fairness

48 Caltech CS184 Spring2003 -- DeHon 48 Test-and-Set Try: t&s R1, A.lock bnz R1, Try return Simple algorithm generate considerable traffic p contenders –p try first, 1 wins –for o(1) time p-1 spin –…then p-2… –c*(p+p-1+p-2,,,) –O(p 2 )

49 Caltech CS184 Spring2003 -- DeHon 49 Backoff Instead of immediately retrying –wait some time before retry –reduces contention –may increase latency (what if I’m only contender and is about to be released?)

50 Caltech CS184 Spring2003 -- DeHon 50 Primitive Bus Performance [Culler/Singh/Gupta 5.29]

51 Caltech CS184 Spring2003 -- DeHon 51 Bad Effects Performance Decreases with users –From growing traffic already noted

52 Caltech CS184 Spring2003 -- DeHon 52 Test-test-and-Set Try: ld R1, A.lock bnz R1, Try t&s R1, A.lock bnz R1, Try return Read can be to local cache Not generate bus traffic Generates less contention traffic

53 Caltech CS184 Spring2003 -- DeHon 53 Detecting atomicity sufficient Fine to detect if operation will appear atomic Pair of instructions –ll -- load locked load value and mark in cache as locked –sc -- store conditional stores value iff no intervening write to address e.g. cache-line never invalidated by write

54 Caltech CS184 Spring2003 -- DeHon 54 LL/SC operation Try: LL R1 A.lock BNZ R1, Try SC R2, A.lock BEQZ Try return from lock

55 Caltech CS184 Spring2003 -- DeHon 55 LL/SC Pair doesn’t really lock value Just detects if result would appear that way Ok to have arbitrary interleaving between LL and SC Ok to have capacity eviction between LL and SC –will just fail and retry

56 Caltech CS184 Spring2003 -- DeHon 56 LL/SC and MP Traffic Address can be cached Spin on LL not generate global traffic (everyone have their own copy) After write (e.g. unlock) –everyone miss -- O(p) message traffic No need to lock down bus during operation

57 Caltech CS184 Spring2003 -- DeHon 57 Performance Bus [Culler/Singh/Gupta 5.30] [talk about array+ticket later]

58 Caltech CS184 Spring2003 -- DeHon 58 Big Ideas Simple Model –Preserve model –While optimizing implementation Exploit Locality –Reduce bandwidth and latency

59 Caltech CS184 Spring2003 -- DeHon 59 Big Ideas Simple primitives –Must have primitives to support atomic operations –don’t have to implement atomicly just detect non-atomicity Make fast case common –optimize for locality –minimize contention


Download ppt "Caltech CS184 Spring2003 -- DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory."

Similar presentations


Ads by Google