Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo.

Similar presentations


Presentation on theme: "Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo."— Presentation transcript:

1 Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo

2  Previous TM systems abort fast, commit slow ◦ Old values “in place” ◦ New values somewhere else  Commit is the common case! ◦ Remember Amdahl’s Law  Conflicts usually solved by hardware ◦ Fast but myopic ◦ Trapping to SW if needed for careful resolution

3 Version Management LazyEager Conflict Lazy OCC DBMSs TCC none Eager LTM VTM CCC DBMSs UTM LogTM

4  Eager version management ◦ Puts new values in place for faster commits ◦ No data moves even on cache overflow  Eager conflict detection ◦ Detects offending ld/st immediately ◦ Fast conflict detection on evicted blocks ◦ Fast commit by lazy reset of directory state  Handle aborts by SW ◦ Aborts are much less common than commits

5  Per-thread log in cacheable virtual memory ◦ On st. logs address and previous contents of block  Write bit ◦ Tracks if a block has been stored and logged  Faster commits ◦ Clear W bits and reset log (pointer)  Slower aborts ◦ Also has to write old values back

6 1 2 - - - - - - - - - - - - - - 2 3 3 4 - - - - - - - Virtual Address Data Block R W LogBase LogPtr 00 40 c0 1000 1040 1080 1000 1 00 00 00

7 1 2 - - - - - - - - - - - - - - 2 3 3 4 - - - - - - - Virtual Address Data Block R W LogBase LogPtr 00 40 c0 1000 1040 1080 1000 1 10 00 00

8 1 2 - - - - - - - - - - - - - - 2 3 5 6 - - - - - - - c 0 3 4 - - - - - - - Virtual Address Data Block R W LogBase LogPtr 00 40 c0 1000 1040 1080 1048 10001 10 00 01

9 1 2 - - - - - - - - - - - - - - 2 4 5 6 - - - - - - - c 0 3 4 - - - - - - - 4 0 - - - - - - - 2 3 Virtual Address Data Block R W LogBase LogPtr 00 40 c0 1000 1040 1080 1090 10001 10 11 01

10 1 2 - - - - - - - - - - - - - - 2 4 5 6 - - - - - - - c 0 3 4 - - - - - - - 4 0 - - - - - - - 2 3 Virtual Address Data Block R W LogBase LogPtr 00 40 c0 1000 1040 1080 1000 0 00 00 00

11 1 2 - - - - - - - - - - - - - - 2 3 3 4 - - - - - - - c 0 3 4 - - - - - - - 4 0 - - - - - - - 2 3 Virtual Address Data Block R W LogBase LogPtr 00 40 c0 1000 1040 1080 1000 0 00 00 00

12  Coherence requests sent to directory  Directory will forward to other processor(s)  Processors will detect conflict ◦ Using local state ◦ Ack/Nack as response ◦ Requester resolves any conflict  Adds read bit to each cache block  Extends MOESI protocol ◦ “Sticky” states

13  Works even after cache overflow ◦ Forward to conflicting requests to “interested” processors  Adds a per processor overflow bit ◦ The transactional block can be updated ◦ Requests will still be redirected to the processor ◦ Processor can Nack on conflict

14  Depends on MOESI state  M: Replace with transactional writeback ◦ Sets state as “Sticky@Processor” ◦ Requests are forwarded to the processor  S: Silently replaced, ◦ Adds processor to sharer list ◦ Requests forwarded to all sharers  O: Write back to directory ◦ Add itself to sharer list, same as S if requested exclusively  E: Same as O

15 Directory Idle [old] P TMcount: 1 Overflow: 0 I (--) [none]

16 Directory M@P [old] P TMcount: 1 Overflow: 0 M (R W) [new] GETX DATA ACK

17 Directory M@P [old] P TMcount: 1 Overflow: 0 M (R W) [new] Q TMcount: 1 Overflow: 0 I (- -) [ ] Fwd_GETS NACK GETS NACK

18 Directory M@P[new ] P TMcount: 1 Overflow: 1 I (- -) [ ] PUTX NACK WB_XACT

19 Directory M@P[new ] P TMcount: 1 Overflow: 1 I (- -) [ ] GETS Fwd_GETS NACK Q TMcount: 1 Overflow: 0 I (- -) [ ] NACK

20 Directory E@Q[new] P TMcount: 0 Overflow: 0 I (- -) [ ] GETS Fwd_GETS ACK Q TMcount: 1 Overflow: 0 E (R -) [new] DATA CLEAN

21  Lazy clean up better if overflow is rare ◦ Can be improved otherwise (i.e. use Bloom filters)  Ambiguities handled conservatively ◦ Refetch during same against earlier transaction ◦ Set R&W bits ◦ Log old values

22

23  When two transactions conflict ◦ At least one must stall or abort ◦ Quick myopic decision by HW ◦ Slow and careful by SW  Hybrid approach: ◦ HW seeks fast solution, traps to software if problem persists

24  Distributed timestamp  Trap to conflict handler (SW) ◦ Transaction could cause deadlock ◦ Logically later than transaction in conflict  Per processor possible cycle flag ◦ Conflict if nack received from a logically earlier transaction with possible cycle flag set

25  Target System ◦ SPARC Solaris 32 Processors 1Ghz ◦ L1: 16KB 4-way split, 1 cycle latency ◦ L2: 4 MB 4-way unified, 12-cycle latency ◦ Memory: 4GB 80-cycle latency ◦ Directory: Full-bit vector sharer list, migratory sharing optimization, directory cache, 6-cycle latency ◦ Interconnection: Hierarchical switch topology, 14- cycle link latency  Simulated using Simics ◦ LogTM interface added by “magic” instructions

26  Shared counter micro-benchmark  Compared to ◦ Exponential Backoff ◦ MCS locks  LogTM outperforms them  LogTM does not abort transactions

27  Evaluated using a subset of SPLASH-2  Used two versions of raytrace (with/without false sharing)  False sharing has significant impact!  Performance gains from moderate to large

28  LogTM must read a block before writing it to the log ◦ Benchmarks showed that data is usually read anyway  LogTM is more sensitive to false sharing than lock approaches  Since the log is required to be valid only until an abort ◦ A k-block log write buffer reduces most writes as shown in the benchmarks.

29  TCC ◦ Lazy version management (slow commits) ◦ Lazy conflict detection (detect on commit)  LTM ◦ On overflow stores new values in uncacheable in- memory hash table ◦ LogTM allows both old and new versions cached

30  UTM ◦ Logs blocks targeted by both loads and stores ◦ More complete conflict detection ◦ Must walk log on certain coherence requests  VTM ◦ Per address space virtual mode for cache evictions, paging, context switches ◦ Virtualized VTM uses micro-code for conflict detection. (LogTM uses MOESI extension)

31  Presents a TM implementation designed to speed up the common case  Efficiently handles cache evictions  Requires simple architectural changes ◦ Registers, state, directory extension  Work towards hybrid conflict detection  No paging or context switch support  Very sensitive to false sharing


Download ppt "Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo."

Similar presentations


Ads by Google