Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording (ASLPOS’06) Min Xu Rastislav BodikMark D. Hill Shimin Chen LBA Reading Group Presentation.

Similar presentations


Presentation on theme: "A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording (ASLPOS’06) Min Xu Rastislav BodikMark D. Hill Shimin Chen LBA Reading Group Presentation."— Presentation transcript:

1 A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording (ASLPOS’06) Min Xu Rastislav BodikMark D. Hill Shimin Chen LBA Reading Group Presentation

2 1 % gcc sim.c % a.out Segmentation fault % % gdb a.out gdb> run Program received SIGSEGV. In get() at hash.c:45 45 a = bucket->d; % gdb a.out gdb> run Program exited normally. gdb> % gcc para-sim.c % a.out Segmentation fault % Why Do You Need a Recorder? % gdb a.out log gdb> run Program received SIGSEGV. In get() at para-hash.c:67 67 a = bucket->d; % gcc para-sim.c % a.out Segmentation fault Race recorded in “log” %

3 2 Ideally … % gdb a.out log gdb> run Program received SIGSEGV. In get() at para-hash.c:67 67 a = bucket->d; % gcc para-sim.c % a.out Segmentation fault Race recorded in “log” % Long recording: small log Low runtime overhead Low cost Applicability: Programs – data race Systems – non-SC

4 3 Flight Data Recorder (ISCA’03) Full-system Record-Replay Recording memory races: Assumes Sequential Consistency (SC) Record order of instruction interleaving Target cache-coherence multiprocessor server Piggyback on coherence protocol: little extra H/W Recording system states: SafetyNet Recording I/Os Results: Non-trivial recording interval: 1 second Negligible runtime overhead: less than 2% Can be “Always On”

5 4 RTR Better memory race log compression 1 byte per Kilo instructions Dealing with Total Store Ordering In this talk, I will try to describe a full picture combining FDR and RTR.

6 5 Outline Introduction Recording System State Recording Input/Output Recording Memory Races Dealing with TSO Summary

7 6 Recording System State (based on SafetyNet) Purpose: re-construct the initial state (registers, TLB, main memory) at the beginning of the replay interval Policy: FDR’s 1second replay interval Take a logical checkpoint every 1/3 second Reserve memory space to store logs for 4 checkpoints Logical checkpoint: Quiesce entire system to take a physical checkpoint Registers and TLB states (4248 bytes/processor on SPARC V9) Log old value of a cache line upon first update Add an “already-updated” bit per cache line

8 7 FDR paper

9 8 Outline Introduction Recording System State Recording Input/Output Recording Memory Races Dealing with TSO Summary

10 9 Recording I/O I/O loads Instruction count + interrupt number DMA store values

11 10 Outline Introduction Recording System State Recording Input/Output Recording Memory Races Dealing with TSO Summary

12 11 Log All Dependence 1 2 3 4 5 6 1 2 3 4 5 6 ld A Thread I Thread J Replay st B st C sub ld B add st C ld B st A st C ld D st D Log J : 2  3 1  4 3  5 4  6 Log I : 2  3 Log Size: 5*16=80 bytes (10 integers) Dependence Log 16 bytes But too many dependence

13 12 Netzer’s Transitive Reduction (TR) approximated by FDR 1 2 3 4 5 6 1 2 3 4 5 6 ld A Thread I Thread J Replay st B st C sub ld B add st C ld B st A st C ld D st D TR reduced Log J : 2  3 3  5 4  6 Log I : 2  3 Log Size: 64 bytes (8 integers) TR Reduced Log How to further reduce log size?

14 13 RTR Actively creating artificial dependencies Stricter Vectorized

15 14 The Intuition of the RTR Algorithm After Reduction From I to J From J to I Vectors “Regulate” Replay

16 15 Stricter Dependences to Aid Vectorization 1 2 3 4 1 2 3 4 ld A Thread I Thread J Replay st B st C add st C ld B st A ld D 55 subst C 66 ld B st D Log J : 2  3 4  5 Log I : 2  3 Log Size: 48 bytes (6 integers) New Reduced Log stricter Reduced Fewer dependencies to log

17 16 Compress Vectorized Dependencies 1 2 3 4 5 6 1 2 3 4 5 6 ld A Thread I Thread J Replay st B st C sub ld B add st C ld B st A st C ld D st D Log J : x=3,5, ∆=1 Log I : x=3, ∆=1 Log Size: 40 bytes (5 integers) Vectorized Log Vector Deps. TR  RTR: fewer deps + fewer byte/dep

18 17

19 18 H/W Considerations (IC) Instruction count per core -- easy (VIC[p]) record previously seen senders’ largest time stamps for transitive reduction (CTS[b]) time stamp per cache block: i.e. record IC upon load/store commits At commit time: Figure out memory address – how difficult? Write CTS: decoupled timestamp memory

20 19 H/W Considerations Cont’d Piggyback on cache coherence messages FDR: CTS[b] RTR: CTS[b] & sender’s IC Logic to perform algorithm at the receiver side FDR: integer comparison, update VIC[sender], generate log record RTR: in addition, max/min, integer subtraction Augment directory structure Record last owner for evicted blocks Cache must respond to inquiries about evicted blocks: reply with CTS[SET/LRU]

21 20 Outline Introduction Recording System State Recording Input/Output Recording Memory Races Dealing with TSO Summary

22 21 Total Store Ordering FIFO Write buffer A store commits by placing its value into write buffer A store is ordered when it exits the write buffer and updates the memory Stores are ordered in commit order (FIFO) Load can obtain values from write buffer or from memory system

23 22 Problems with TSO /* XXX */ is memory order The two examples create cycles that will result in replay deadlocks

24 23 Solution Identify problematic load instructions Monitor invalidation in [t1, t2] t1: the load (or the previous store that feeds the load) is ordered at memory t2: all preceding instructions are ordered Log load values and replay these load instructions by values HW: similar to the misspeculation detection circuitry in SC systems (e.g. MIPS R10000) Insufficient for supporting Processor Consistency and other more relaxed models

25 24 Conclusion RTR  1 byte/kilo-instruction Based on Netzer’s transitive reduction Create stricter dependencies Vectorize dependencies to compress log Avoid overly-strict hence no deadlock


Download ppt "A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording (ASLPOS’06) Min Xu Rastislav BodikMark D. Hill Shimin Chen LBA Reading Group Presentation."

Similar presentations


Ads by Google