Presentation on theme: "Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan."— Presentation transcript:
Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan
Debugging Multi – Threading Programs Debuggers – always helpfulAim of discussion Deterministic replay of multi processor execution Record non deterministic events, specially memory races Flight Data Recorder (FDR)Bug – NetStrata
FDR -- Approach Deterministic re-players and data race detectors exist FDR – Records operating system and I/O issues
FDR -- Assumptions Sequential ConsistencyDirectory based schemeCache size is same as memory
FDR -- Kinds of logs 3 kinds to meet performance, space and complexity requirements To restore consistent state logs old memory on updates – checkpoints and logging Record outcome of races assumes SC and records subset (implied races omitted) Record system I/O logs interrupt timing and treats device interfaces as pseudo processors. Has low time space overhead – continuously enabled
Recording Races Necessary to log non deterministic thread interleaving – outcomes of races Question? – how much… solution in memory model – here SC Record arcs – order pairs of dynamic instructions – not all Time stamps of cached blocks stored – missing timestamps approx
FDR Issues and Optimizations Log Size – Regulated Transitive Reduction – judiciously log strict vector dependencies Hardware Cost – false races – approx on LRU in associative set – 24KB per core Simpler Design – take timestamps out of the cache TSO Model – avoids replay deadlocks of SC – additional info of load values
BugNet:Net the Bug Architecture support for Deterministic Replay Debugging.Focus on replay of user code and shared libraries.Built, improving on the ideas of FDR Claim to be viable for use with software development (application).
Archtecture Overview Checkpoint based recording Check Point Interval snapshots CP buffer (PC+Reg Map) Observe the Loads done by threads to trace the complete execution Intial Register Values in a CP The Trace of the loads Tracking loads works in spite of interrupts,DMA transfers and other threads writing to shared memory. Load Bits in cache Reduce multiple loads/log size. Updates stores from external events FLL and MRBDictionary based compression For log data
FDR vs BugNet FDR Features include tracking I/O, Interrupts, DMA accesses. Extra Hardware and log size overhead BugNet Focus on application level S/W debugging, simpler scheme. Smaller in terms of Hardware and Log Size
Assumptions/Limitations Assumes a sequential consistency memory model Wont help in finding bugs which are caused by interactions with the OS and other system code. Question usability in mainstream systems. For debugging user level applications, software based recording more viable?
Strata – Logging Shared Memory Dependencies Record memory counts on a dependencyHardware/cache-based scheme Assumes sequential consistency Dictionary and Snoopy cache consistency Drop-in replacement for Netzer’s scheme Smaller log size Less computation to create log More complicated replay Narayanasamyet. al. ASPLOS06
Strata cont. Lowresource overhead 12% bandwidth on Dictionary Scheme ~0% bandwidth on Snoopy Scheme Scales linearly with number of threads Each stratum holds one word per threads Potentially worse than Netzer’s scheme
Concerns and Criticisms All systems are require hardware Significant resource overhead Software would be slower, but still useful Consistency models restrictive Exclude commodity hardware (x86) Encourages sloppy programming Users != Testers