Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 SigRace: Signature-Based Data Race Detection Abdullah Muzahid, Dario Suarez*, Shanxiang Qi & Josep Torrellas Computer Science Department University of.

Similar presentations


Presentation on theme: "1 SigRace: Signature-Based Data Race Detection Abdullah Muzahid, Dario Suarez*, Shanxiang Qi & Josep Torrellas Computer Science Department University of."— Presentation transcript:

1 1 SigRace: Signature-Based Data Race Detection Abdullah Muzahid, Dario Suarez*, Shanxiang Qi & Josep Torrellas Computer Science Department University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu *Universidad de Zaragoza, Spain

2 2 Debugging Multithreaded Programs “Debugging a multithreaded program has a lot in common with medieval torture methods” -- Random quote found via Google search

3 3 Data Race Two threads access the same variable without intervening synchronization and at least one is a write T1T2 lock L unlock L x++ Hard to detect and reproduce Common bug

4 4 Dynamic Data Race Detection Mainly two approaches

5 5 Dynamic Data Race Detection Mainly two approaches –Lockset: Finds violation of locking discipline

6 6 Dynamic Data Race Detection Mainly two approaches –Lockset: Finds violation of locking discipline –Happened-Before: Finds concurrent conflicting accesses

7 7 Happened-Before Approach

8 8 [0, 0] Thread 0Thread 1

9 9 Happened-Before Approach Lock L [0, 0] [1, 0] [0, 0] Thread 0Thread 1

10 10 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1

11 11 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1

12 12 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L

13 13 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [0, 1]

14 14 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] + [0, 1]

15 15 Happened-Before Approach Epoch : sync to sync Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1]

16 16 Happened-Before Approach Epoch : sync to sync Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e

17 17 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync

18 18 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync

19 19 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync

20 20 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync a, b happened before e

21 21 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync a, b happened before e

22 22 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync a, b happened before e c, d unordered

23 23 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync a, b happened before e c, d unordered = x x = Data Race

24 24 Software Implementation Need to instrument every memory access –10x – 50x slowdown –Not suitable for production runs

25 25 Hardware Implementation

26 26 Hardware Implementation P1P2 …… C1C2

27 27 Hardware Implementation P1P2 …… TS C1C2

28 28 Hardware Implementation P1P2 …… xts 1 TS C1C2

29 29 Hardware Implementation P1P2 …… xxts 1 ts 2 … TS C1C2 WR

30 30 Hardware Implementation P1P2 …… xxts 1 ts 2 … check TS C1C2 WR

31 31 Limitations of HW Approaches

32 32 Limitations of HW Approaches P1P2 …… … C1 C2 Modify cache and coherence protocol

33 33 Limitations of HW Approaches P1P2 …… … C1 C2 Modify cache and coherence protocol Perform checking at least on every coherence transaction ts 1 ts 2 check

34 34 Limitations of HW Approaches P1P2 …… C1 C2 Modify cache and coherence protocol Perform checking at least on every coherence transaction Lose detection ability when cache line is displaced or invalidated

35 35 Our Contributions SigRace: Novel HW mechanism for race detection based on signatures –Simple HW Cache and coherence protocol are unchanged –Higher coverage than existing HW schemes Detect races even if the line is displaced/invalidated SigRace finds 150% more injected races than a state-of-the-art HW proposal –Usable on-the-fly in production runs

36 36 Outline Motivation Main Idea Implementation Results Conclusions

37 37 Main Idea Address Signature + Happened-before

38 38 Hardware Address Signatures [C eze et al, ISCA06] … Permute … ROB address ld/st

39 39 Hardware Address Signatures

40 40 Hardware Address Signatures Logical AND for intersection Has false positives but not false negatives

41 41 Using Signatures for Race Detection sync Epoch

42 42 Using Signatures for Race Detection sync Epoch Sig TS

43 43 Using Signatures for Race Detection Block Block is a fixed number of dynamic instructions (not a cache block or basic block or atomic block) sync Epoch Sig1 TS1

44 44 Using Signatures for Race Detection Block sync Epoch Race Detection Module Sig1 TS1

45 45 Using Signatures for Race Detection Block sync Epoch Ø TS1 Race Detection Module

46 46 Using Signatures for Race Detection Block sync Epoch Sig2 TS1 Race Detection Module

47 47 Using Signatures for Race Detection Block sync Epoch Sig2 TS1 Race Detection Module

48 48 Using Signatures for Race Detection Block sync Epoch Ø TS1 Race Detection Module

49 49 Using Signatures for Race Detection Block sync Epoch Sig3 TS2 Race Detection Module

50 50 Using Signatures for Race Detection Block sync Epoch Sig3 TS2 Race Detection Module

51 51 Using Signatures for Race Detection Block sync Epoch Sig3 TS2 Race Detection Module

52 52 Using Signatures for Race Detection Block sync Epoch Sig3 TS2 Sig Ո Sig Race Detection Module

53 53 On Chip Race Detection Module (RDM)

54 54 On Chip Race Detection Module (RDM) P1P2 Q1Q2 Chip RDM

55 55 On Chip Race Detection Module (RDM) P1P2 T1 R1 W1 Q1Q2 Chip RDM

56 56 On Chip Race Detection Module (RDM) P1P2 T1 R1 W1 Q1Q2 Chip RDM

57 57 On Chip Race Detection Module (RDM) P1P2 T1 R1 W1 Q1Q2 Chip RDM

58 58 On Chip Race Detection Module (RDM) P1P2 T2 R2 W2 Q1Q2 T1 R1 W1 Chip RDM

59 59 On Chip Race Detection Module (RDM) P1P2 T2 R2 W2 Q1Q2 T1 R1 W1 Chip RDM

60 60 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 Chip RDM

61 61 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 TJ RJ WJ If T2 & TJ unordered R2 Ո WJ W2 Ո WJ W2 Ո RJ Else stop Chip RDM

62 62 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 TJ` RJ` WJ` If T2 & TJ` unordered R2 Ո WJ` W2 Ո WJ` W2 Ո RJ` Else stop Chip RDM

63 63 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 TJ” RJ” WJ” If T2 & TJ” unordered R2 Ո WJ” W2 Ո WJ” W2 Ո RJ” Else stop Chip RDM

64 64 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 TJ”` RJ”` WJ”` If T2 & TJ”` unordered R2 Ո WJ”` W2 Ո WJ”` W2 Ո RJ”` Else stop Chip RDM

65 65 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 TJ”` RJ”` WJ”` If T2 & TJ”` unordered R2 Ո WJ”` W2 Ո WJ”` W2 Ո RJ”` Else stop Chip RDM Done in Background

66 66 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 TJ”` RJ”` WJ”` If T2 & TJ”` unordered R2 Ո WJ”` W2 Ո WJ”` W2 Ո RJ”` Else stop Chip RDM False Positives

67 67 Re-execution Needed for –Discard if a false positive –Identify the accesses involved

68 68 Support for Re-execution Save synchronization history in TS Log –Timestamp at sync points Log inputs (interrupts, sys calls, etc) Take periodic checkpoints: ReVive [Prvulovic et al, ISCA02]

69 69 Modes of Operation Normal Execution Re-Execution: Bring the program to just before the race Race Analysis: Pinpoint the racy accesses or discarding the false positive

70 70 SigRace Re-execution Mode Can be done in another machine

71 71 SigRace Re-execution Mode Can be done in another machine Periodic checkpoint of memory state sync checkpoint T0T1T2

72 72 SigRace Re-execution Mode Can be done in another machine Periodic checkpoint of memory state sync checkpoint s1s1 s2s2 Data Race T0T1T2

73 73 SigRace Re-execution Mode Can be done in another machine Periodic checkpoint of memory state sync checkpoint s1s1 s2s2 Data Race Ո Conflict Sig T0T1T2

74 74 SigRace Re-execution Mode Can be done in another machine Periodic checkpoint of memory state sync checkpoint s1s1 s2s2 Data Race Ո Conflict Sig checkpoint T0T1T2T0T1T2

75 75 SigRace Re-execution Mode Can be done in another machine Periodic checkpoint of memory state sync checkpoint s1s1 s2s2 Data Race Ո Conflict Sig sync checkpoint T0T1T2T0T1T2 Use the TS Log

76 76 SigRace Analysis Mode sync checkpoint T0T1T2

77 77 SigRace Analysis Mode sync checkpoint T0T1T2

78 78 SigRace Analysis Mode sync checkpoint T0T1T2 Conflict Sig ld Ո

79 79 SigRace Analysis Mode sync checkpoint T0T1T2 Conflict Sig ld log Ո

80 80 SigRace Analysis Mode sync checkpoint T0T1T2 Conflict Sig ld log sync Ո

81 81 SigRace Analysis Mode sync checkpoint T0T1T2 log sync log

82 82 SigRace Analysis Mode sync checkpoint T0T1T2 log sync log Pinpoints racy addresses or, Identifies and discards false positives

83 83 Outline Motivation Main Idea Implementation Results Conclusions

84 84 New Instructions collect_on –Enable R and W address collection in current thread

85 85 New Instructions collect_on –Enable R and W address collection in current thread collect_off –Disable R and W address collection in current thread

86 86 New Instructions sync_reached

87 87 New Instructions sync_reached –Dump TS, R and W P RDM Network TS R W

88 88 New Instructions sync_reached –Dump TS, R and W –Clear signatures P RDM Network TS Ø Ø

89 89 New Instructions sync_reached –Dump TS, R and W –Clear signatures –Update TS P RDM Network TS` Ø Ø

90 90 Modifications in Sync Libraries

91 91 Modifications in Sync Libraries Synchronization object variabletimestamp

92 92 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock)

93 93 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock) sync_reached RDM Network TS R W P

94 94 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock) sync_reached RDM Network TS Ø Ø P

95 95 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock) sync_reached RDM Network TS Ø Ø P

96 96 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock) sync_reached $1.timestamp = TS lockTS

97 97 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock) sync_reached $1.timestamp = TS

98 98 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock) sync_reached $1.timestamp = TS AppendtoTSLog(TS) TS LogTS

99 99 Modifications in Sync Libraries Synchronization object Lock macro variabletimestamp LOCK (‘{ }’) lock($1.lock) … …

100 100 Modifications in Sync Libraries Synchronization object Lock macro variabletimestamp LOCK (‘{ }’) lock($1.lock) … TS = GenerateTS (TS, $1.timestamp) locktimestamp …

101 101 Modifications in Sync Libraries Synchronization object Lock macro variabletimestamp LOCK (‘{ }’) lock($1.lock) … TS = GenerateTS (TS, $1.timestamp) locktimestamp … Transparent to Application Code

102 102 Other Topics in Paper Easy to virtualize Queue Overflow Detailed HW structures

103 103 Outline Motivation Main Idea Implementation Results Conclusions

104 104 Experimental Setup PIN – Binary Instrumention Tool Default parameters Benchmarks: SPLASH2, PARSEC – # of proc: 8 – Signature size: 2 Kbits – Block size: 2,000 ins – Queue size: 16 entries – Checkpoint interval: 1 Million ins

105 105 Race Detection Ability Three configurations –SigRace Default –SigRace Ideal : Stores every signature between 2 checkpoints –ReEnact [Prvulovic et al, ISCA03]: Cache based approach with timestamp per word

106 106 Race Detection Ability AppIdeal SigRace Default SigRce ReEnact Cholesky 16 Barnes 11 6 Volrend 27 18 Ocean 111 Radiosity 15 12 Raytrace 443 Water-sp 842 Streamclus ter 131213 Total959070

107 107 Race Detection Ability More coverage than ReEnact AppIdeal SigRace Default SigRce ReEnact Cholesky 16 Barnes 11 6 Volrend 27 18 Ocean 111 Radiosity 15 12 Raytrace 443 Water-sp 842 Streamclus ter 131213 Total959070

108 108 Race Detection Ability More coverage than ReEnact Coverage comparable to ideal configuration AppIdeal SigRace Default SigRce ReEnact Cholesky 16 Barnes 11 6 Volrend 27 18 Ocean 111 Radiosity 15 12 Raytrace 443 Water-sp 842 Streamclus ter 131213 Total959070

109 109 Injected Races Removed one dynamic sync per run Each application runs 25 times with diff sync elimination

110 110 Injected Races

111 111 Injected Races More overall coverage than ReEnact

112 112 Injected Races More overall coverage than ReEnact –150% more coverage

113 113 Injected Races More overall coverage than ReEnact –150% more coverage

114 114 Conclusions Proposed SigRace: –Simple HW Cache and coherence protocol are unchanged –Higher coverage than existing HW schemes Detect races even if the line is displaced/invalidated –Usable on-the-fly in production runs SigRace finds 150% more injected races than word-based ReEnact

115 115 SigRace: Signature-Based Data Race Detection Abdullah Muzahid, Dario Suarez*, Shanxiang Qi & Josep Torrellas Computer Science Department University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu *Universidad de Zaragoza, Spain

116 116 Back Up Slides

117 117 Execution Overhead No overhead in generating signatures (HW) Additional instructions are negligible Main overheads –Checkpointing (ReVive – 6.3%) –Network traffic (63 bytes per 1000 ins - compressed) –Re-execution (depends on false positives & race position) Can be done offline

118 118 Network Traffic Overhead 63 ̴ 1 cache line

119 119 Re-execution Overhead Instructions re-executed until the first true data race is analyzed are shown as overhead In this process, it may also encounter many false positive races Instructions re-executed to analyze only the true race are shown as true overhead Instructions re-executed to filter out the false positives are shown as false overhead

120 120 Re-execution Overhead 22% Modest overhead

121 121 False Positives Parallel bloom filters with H 3 hash function 1.57% Low False Positive

122 122 Virtualization

123 123 Virtualization RDM uses as many queues as the number of threads Timestamp is accessed by thread id Thread id remains same even after migration Timestamps, flags, conflict signature are saved and restored at context switch RDM intersects incoming signatures against all other threads’(even inactive ones) signatures Threads can be re-executed without any scheduling constraints

124 124 Scalability For small # of proc., scalability is not a problem The operation of RDM can be pipelined –Simple repetitive operation Network traffic (compressed message) around 63Bytes/thousand ins Checkpoint is an issue.


Download ppt "1 SigRace: Signature-Based Data Race Detection Abdullah Muzahid, Dario Suarez*, Shanxiang Qi & Josep Torrellas Computer Science Department University of."

Similar presentations


Ads by Google