Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detecting and surviving data races using complementary schedules Kaushik Veeraraghavan Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Similar presentations


Presentation on theme: "Detecting and surviving data races using complementary schedules Kaushik Veeraraghavan Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan."— Presentation transcript:

1 Detecting and surviving data races using complementary schedules Kaushik Veeraraghavan Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan

2 Multicores/multiprocessors are ubiquitous Most desktops, laptops & cellphones use multiprocessors Multithreading is a common way to exploit hardware parallelism Problem: it is hard to write correct multithreaded programs! 2Kaushik Veeraraghavan

3 Data races are a serious problem Data race: Two instructions (at least one of which is a write) that access the same shared data without being ordered by synchronization Data races can cause catastrophic failures – Therac-25 radiation overdose – 2003 Northeast US power blackout Kaushik Veeraraghavan3 proc_info = 0; MySQL bug #3596 crash If (proc_info) { fputs (proc_info, f); }

4 First goal: efficient data race detection Data race detection – High coverage (find harmful data races) – Accurate (no false positives) – Low overhead Kaushik Veeraraghavan4 High coverageSampling Native (C/C++)ThreadSanitizer (30X) Frost (3X) DataCollider (1.1x with 4 watchpoints) Frost 3.5% coverage) Managed (Java/C#)FastTrack (8.5X)PACER 3% coverage)

5 Second goal: data race survival Unknown data race might manifest at runtime Mask harmful effect so system stays running Kaushik Veeraraghavan5

6 Outline Motivation Design – Outcome-based race detection – Complementary schedules Implementation: Frost – New, fast method to detect the effect of a data race – Masks effect of harmful data race bug Evaluation Kaushik Veeraraghavan6

7 State is what matters All prior data race detectors analyze events – Shared memory accesses are very frequent New idea: run multiple replicas and analyze state Goal: replicas diverge if and only if harmful data race Kaushik Veeraraghavan7 proc_info = 0; crash If (proc_info) { fputs (proc_info, f); } proc_info = 0; If (proc_info) { fputs (proc_info, f); } ✔

8 No false positives Divergence  data race Race-free replicas will never diverge – Identical inputs – Obey same happens-before ordering Outcome-based race detection – Divergence in program or output state indicates race Kaushik Veeraraghavan8

9 Minimize false negatives Harmful data race  divergence Complementary schedules – Make replica schedules as dissimilar as possible – If instructions A & B are unordered, one replica executes A before B and the other executes B before A Kaushik Veeraraghavan9

10 Complementary schedules in action We do not know a priori that a race exists Replicas schedule unordered instructions in opposite orders – Race detection: replicas diverge in output – Race survival: use surviving replica to continue program Kaushik Veeraraghavan10 unlock (*fifo); fifo = NULL; crash ✔ unlock (*fifo); fifo = NULL;

11 Problem: we don’t know which instructions race – Try and flip all pairs of unordered instructions Record total ordering of instructions in one replica – Only one thread runs at a time – Each thread runs non-preemptively until it blocks Other replica executes instructions in reverse order How to construct complementary schedules? Kaushik Veeraraghavan11 T3 T1 T2 T3 T2 T1

12 Type I data race bug Kaushik Veeraraghavan12 Failure requirement: order of instructions that leads to failure – E.g.: if “fifo = NULL;” is ordered first, program crashes Type I bug: all failure requirements point in same direction Guarantee race detection for synchronization-free region as replicas diverge Survival if we can identify correct replica crash unlock (*fifo); fifo = NULL; crash unlock (*fifo); fifo = NULL; Replica 1 ✔ unlock (*fifo); fifo = NULL; Replica 2

13 Type II data race bug Kaushik Veeraraghavan13 Type II bug: failure requirements point in opposite directions Guarantee data race survival for synchronization-free region – Both replicas avoid the failure proc_info = 0; crash If (proc_info) { fputs (proc_info, f); } proc_info = 0; If(proc_info) { fputs(proc_info, f); } Replica 2 ✔ proc_info = 0; If(proc_info) { fputs(proc_info, f); } Replica 1 ✔

14 Leverage uniparallelism to scale performance 14Kaushik Veeraraghavan CPU 4 CPU 2 CPU 5 CPU 3 Frost executes three replicas of each epoch – Leading replica provides checkpoint and non-deterministic event log – Trailing replicas run complementary schedules Upto 3X overhead, but still cheaper than traditional race detectors T2 T1 T2 T1 CPU 0 CPU 1 TIME T1 T2 T1 T2 T1 ckpt Each epoch has three replicas

15 Analyzing epoch outcomes for race detection 15Kaushik Veeraraghavan CPU 4 CPU 2 CPU 5 CPU 3 Race detected if replicas diverge – Self-evident failure? Output or memory difference? Frost guarantees replay for offline debugging T2 T1 T2 T1 CPU 0 CPU 1 TIME T1 T2 T1 T2 T1 Do replica states match? Each epoch has three replicas

16 OutcomesLikely bugSurvival strategy A-AANoneCommit A F-FFNon-race bugRollback A-AB/A-BAType IRollback A-AF/A-FAType ICommit A F-FA/F-AFType ICommit A A-BBType IICommit B A-BCType IICommit B or C F-AAType IICommit A F-ABType IICommit A or B A-BF/A-FBMultipleRollback A-FFMultipleRollback Analyzing epoch outcomes for survival Kaushik Veeraraghavan16

17 OutcomesLikely bugSurvival strategy A-AANoneCommit A F-FFNon-race bugRollback A-AB/A-BAType IRollback A-AF/A-FAType ICommit A F-FA/F-AFType ICommit A A-BBType IICommit B A-BCType IICommit B or C F-AAType IICommit A F-ABType IICommit A or B A-BF/A-FBMultipleRollback A-FFMultipleRollback Analyzing epoch outcomes for survival Kaushik Veeraraghavan17 All replicas agree

18 OutcomesLikely bugSurvival strategy A-AANoneCommit A F-FFNon-race bugRollback A-AB/A-BAType IRollback A-AF/A-FAType ICommit A F-FA/F-AFType ICommit A A-BBType IICommit B A-BCType IICommit B or C F-AAType IICommit A F-ABType IICommit A or B A-BF/A-FBMultipleRollback A-FFMultipleRollback Analyzing epoch outcomes for survival Kaushik Veeraraghavan18 Two outcomes/traili ng replicas differ

19 OutcomesLikely bugSurvival strategy A-AANoneCommit A F-FFNon-race bugRollback A-AB/A-BAType IRollback A-AF/A-FAType ICommit A F-FA/F-AFType ICommit A A-BBType IICommit B A-BCType IICommit B or C F-AAType IICommit A F-ABType IICommit A or B A-BF/A-FBMultipleRollback A-FFMultipleRollback Analyzing epoch outcomes for survival Kaushik Veeraraghavan19 Trailing replicas do not fail

20 OutcomesLikely bugSurvival strategy A-AANoneCommit A F-FFNon-race bugRollback A-AB/A-BAType IRollback A-AF/A-FAType ICommit A F-FA/F-AFType ICommit A A-BBType IICommit B A-BCType IICommit B or C F-AAType IICommit A F-ABType IICommit A or B A-BF/A-FBMultipleRollback A-FFMultipleRollback Analyzing epoch outcomes for survival Kaushik Veeraraghavan20

21 Limitations Multiple type I bugs in an epoch – Rollback and reduce epoch length to separate bugs Priority-inversion – If >2 threads involved in race, 2 replicas insufficient to flip races – Heuristic: threads with frequent constraints are adjacent in order Epoch boundaries – Insert epochs only on system calls. Detection of Type II bugs – Usually some difference in program state or output Kaushik Veeraraghavan21

22 Frost detects and survives all harmful races ApplicationBug manifestation Outcome% survived% detectedRecovery time (sec) pbzip2crashF-AA100% 0.01 Apache #21287double freeA-BB/A-AB100% 0.00 Apache #25520corrupted out.A-BC100% 0.00 Apache #45605assertionA-AB100% 0.00 MySQL #644crashA-BC100% 0.02 MySQL #791missing outputA-BC100% 0.00 MySQL #2011corrupted out.A-BC100% 0.22 MySQL #3596crashF-BC100% 0.00 MySQL #12848crashF-FA100% 0.29 pfscaninfinite loopF-FA100% 0.00 Glibc #12486assertionF-AA100% 0.01 Kaushik Veeraraghavan22

23 Frost detects all harmful races as traditional detector ApplicationHarmful race detectedBenign races TraditionalFrostTraditionalFrost pbzip25531 Apache: # Apache: # Apache: # MySQL: # MySQL: # MySQL: # MySQL: # MySQL: # pfscan5500 Glibc: # Kaushik Veeraraghavan23

24 Frost: performance given spare cores Overhead 3% to 12% given spare cores Kaushik Veeraraghavan24 8% 12% 3% 11%

25 Frost: performance without spare cores Kaushik Veeraraghavan25 127% 194% Overhead ≈200% for cpu-bound apps without spare cores

26 Frost summary Two new ideas – Outcome-based race detection – Complementary schedules Fast data race detection with high coverage – 3%—12% overhead, given spare cores – ≈200% overhead, without spare cores Survives all harmful data race bugs in our tests Kaushik Veeraraghavan26

27 Backup Kaushik Veeraraghavan27

28 Performance: scalability on a 32-core pfscan: Frost scales upto 7 cores Kaushik Veeraraghavan28


Download ppt "Detecting and surviving data races using complementary schedules Kaushik Veeraraghavan Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan."

Similar presentations


Ads by Google