Download presentation
Presentation is loading. Please wait.
1
1 SigRace: Signature-Based Data Race Detection Abdullah Muzahid, Dario Suarez*, Shanxiang Qi & Josep Torrellas Computer Science Department University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu *Universidad de Zaragoza, Spain
2
2 Debugging Multithreaded Programs “Debugging a multithreaded program has a lot in common with medieval torture methods” -- Random quote found via Google search
3
3 Data Race Two threads access the same variable without intervening synchronization and at least one is a write T1T2 lock L unlock L x++ Hard to detect and reproduce Common bug
4
4 Dynamic Data Race Detection Mainly two approaches
5
5 Dynamic Data Race Detection Mainly two approaches –Lockset: Finds violation of locking discipline
6
6 Dynamic Data Race Detection Mainly two approaches –Lockset: Finds violation of locking discipline –Happened-Before: Finds concurrent conflicting accesses
7
7 Happened-Before Approach
8
8 [0, 0] Thread 0Thread 1
9
9 Happened-Before Approach Lock L [0, 0] [1, 0] [0, 0] Thread 0Thread 1
10
10 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1
11
11 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1
12
12 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L
13
13 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [0, 1]
14
14 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] + [0, 1]
15
15 Happened-Before Approach Epoch : sync to sync Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1]
16
16 Happened-Before Approach Epoch : sync to sync Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e
17
17 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync
18
18 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync
19
19 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync
20
20 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync a, b happened before e
21
21 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync a, b happened before e
22
22 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync a, b happened before e c, d unordered
23
23 Happened-Before Approach Lock L Unlock L [0, 0] [1, 0] [2, 0] [0, 0] Thread 0Thread 1 Lock L [2, 1] a b c d e Epoch : sync to sync a, b happened before e c, d unordered = x x = Data Race
24
24 Software Implementation Need to instrument every memory access –10x – 50x slowdown –Not suitable for production runs
25
25 Hardware Implementation
26
26 Hardware Implementation P1P2 …… C1C2
27
27 Hardware Implementation P1P2 …… TS C1C2
28
28 Hardware Implementation P1P2 …… xts 1 TS C1C2
29
29 Hardware Implementation P1P2 …… xxts 1 ts 2 … TS C1C2 WR
30
30 Hardware Implementation P1P2 …… xxts 1 ts 2 … check TS C1C2 WR
31
31 Limitations of HW Approaches
32
32 Limitations of HW Approaches P1P2 …… … C1 C2 Modify cache and coherence protocol
33
33 Limitations of HW Approaches P1P2 …… … C1 C2 Modify cache and coherence protocol Perform checking at least on every coherence transaction ts 1 ts 2 check
34
34 Limitations of HW Approaches P1P2 …… C1 C2 Modify cache and coherence protocol Perform checking at least on every coherence transaction Lose detection ability when cache line is displaced or invalidated
35
35 Our Contributions SigRace: Novel HW mechanism for race detection based on signatures –Simple HW Cache and coherence protocol are unchanged –Higher coverage than existing HW schemes Detect races even if the line is displaced/invalidated SigRace finds 150% more injected races than a state-of-the-art HW proposal –Usable on-the-fly in production runs
36
36 Outline Motivation Main Idea Implementation Results Conclusions
37
37 Main Idea Address Signature + Happened-before
38
38 Hardware Address Signatures [C eze et al, ISCA06] … Permute … ROB address ld/st
39
39 Hardware Address Signatures
40
40 Hardware Address Signatures Logical AND for intersection Has false positives but not false negatives
41
41 Using Signatures for Race Detection sync Epoch
42
42 Using Signatures for Race Detection sync Epoch Sig TS
43
43 Using Signatures for Race Detection Block Block is a fixed number of dynamic instructions (not a cache block or basic block or atomic block) sync Epoch Sig1 TS1
44
44 Using Signatures for Race Detection Block sync Epoch Race Detection Module Sig1 TS1
45
45 Using Signatures for Race Detection Block sync Epoch Ø TS1 Race Detection Module
46
46 Using Signatures for Race Detection Block sync Epoch Sig2 TS1 Race Detection Module
47
47 Using Signatures for Race Detection Block sync Epoch Sig2 TS1 Race Detection Module
48
48 Using Signatures for Race Detection Block sync Epoch Ø TS1 Race Detection Module
49
49 Using Signatures for Race Detection Block sync Epoch Sig3 TS2 Race Detection Module
50
50 Using Signatures for Race Detection Block sync Epoch Sig3 TS2 Race Detection Module
51
51 Using Signatures for Race Detection Block sync Epoch Sig3 TS2 Race Detection Module
52
52 Using Signatures for Race Detection Block sync Epoch Sig3 TS2 Sig Ո Sig Race Detection Module
53
53 On Chip Race Detection Module (RDM)
54
54 On Chip Race Detection Module (RDM) P1P2 Q1Q2 Chip RDM
55
55 On Chip Race Detection Module (RDM) P1P2 T1 R1 W1 Q1Q2 Chip RDM
56
56 On Chip Race Detection Module (RDM) P1P2 T1 R1 W1 Q1Q2 Chip RDM
57
57 On Chip Race Detection Module (RDM) P1P2 T1 R1 W1 Q1Q2 Chip RDM
58
58 On Chip Race Detection Module (RDM) P1P2 T2 R2 W2 Q1Q2 T1 R1 W1 Chip RDM
59
59 On Chip Race Detection Module (RDM) P1P2 T2 R2 W2 Q1Q2 T1 R1 W1 Chip RDM
60
60 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 Chip RDM
61
61 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 TJ RJ WJ If T2 & TJ unordered R2 Ո WJ W2 Ո WJ W2 Ո RJ Else stop Chip RDM
62
62 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 TJ` RJ` WJ` If T2 & TJ` unordered R2 Ո WJ` W2 Ո WJ` W2 Ո RJ` Else stop Chip RDM
63
63 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 TJ” RJ” WJ” If T2 & TJ” unordered R2 Ո WJ” W2 Ո WJ” W2 Ո RJ” Else stop Chip RDM
64
64 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 TJ”` RJ”` WJ”` If T2 & TJ”` unordered R2 Ո WJ”` W2 Ո WJ”` W2 Ո RJ”` Else stop Chip RDM
65
65 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 TJ”` RJ”` WJ”` If T2 & TJ”` unordered R2 Ո WJ”` W2 Ո WJ”` W2 Ո RJ”` Else stop Chip RDM Done in Background
66
66 On Chip Race Detection Module (RDM) P1P2 Q1Q2 T2 R2 W2 T1 R1 W1 TJ”` RJ”` WJ”` If T2 & TJ”` unordered R2 Ո WJ”` W2 Ո WJ”` W2 Ո RJ”` Else stop Chip RDM False Positives
67
67 Re-execution Needed for –Discard if a false positive –Identify the accesses involved
68
68 Support for Re-execution Save synchronization history in TS Log –Timestamp at sync points Log inputs (interrupts, sys calls, etc) Take periodic checkpoints: ReVive [Prvulovic et al, ISCA02]
69
69 Modes of Operation Normal Execution Re-Execution: Bring the program to just before the race Race Analysis: Pinpoint the racy accesses or discarding the false positive
70
70 SigRace Re-execution Mode Can be done in another machine
71
71 SigRace Re-execution Mode Can be done in another machine Periodic checkpoint of memory state sync checkpoint T0T1T2
72
72 SigRace Re-execution Mode Can be done in another machine Periodic checkpoint of memory state sync checkpoint s1s1 s2s2 Data Race T0T1T2
73
73 SigRace Re-execution Mode Can be done in another machine Periodic checkpoint of memory state sync checkpoint s1s1 s2s2 Data Race Ո Conflict Sig T0T1T2
74
74 SigRace Re-execution Mode Can be done in another machine Periodic checkpoint of memory state sync checkpoint s1s1 s2s2 Data Race Ո Conflict Sig checkpoint T0T1T2T0T1T2
75
75 SigRace Re-execution Mode Can be done in another machine Periodic checkpoint of memory state sync checkpoint s1s1 s2s2 Data Race Ո Conflict Sig sync checkpoint T0T1T2T0T1T2 Use the TS Log
76
76 SigRace Analysis Mode sync checkpoint T0T1T2
77
77 SigRace Analysis Mode sync checkpoint T0T1T2
78
78 SigRace Analysis Mode sync checkpoint T0T1T2 Conflict Sig ld Ո
79
79 SigRace Analysis Mode sync checkpoint T0T1T2 Conflict Sig ld log Ո
80
80 SigRace Analysis Mode sync checkpoint T0T1T2 Conflict Sig ld log sync Ո
81
81 SigRace Analysis Mode sync checkpoint T0T1T2 log sync log
82
82 SigRace Analysis Mode sync checkpoint T0T1T2 log sync log Pinpoints racy addresses or, Identifies and discards false positives
83
83 Outline Motivation Main Idea Implementation Results Conclusions
84
84 New Instructions collect_on –Enable R and W address collection in current thread
85
85 New Instructions collect_on –Enable R and W address collection in current thread collect_off –Disable R and W address collection in current thread
86
86 New Instructions sync_reached
87
87 New Instructions sync_reached –Dump TS, R and W P RDM Network TS R W
88
88 New Instructions sync_reached –Dump TS, R and W –Clear signatures P RDM Network TS Ø Ø
89
89 New Instructions sync_reached –Dump TS, R and W –Clear signatures –Update TS P RDM Network TS` Ø Ø
90
90 Modifications in Sync Libraries
91
91 Modifications in Sync Libraries Synchronization object variabletimestamp
92
92 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock)
93
93 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock) sync_reached RDM Network TS R W P
94
94 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock) sync_reached RDM Network TS Ø Ø P
95
95 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock) sync_reached RDM Network TS Ø Ø P
96
96 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock) sync_reached $1.timestamp = TS lockTS
97
97 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock) sync_reached $1.timestamp = TS
98
98 Modifications in Sync Libraries Synchronization object Unlock macro variabletimestamp UNLOCK (‘{ }’) unlock($1.lock) sync_reached $1.timestamp = TS AppendtoTSLog(TS) TS LogTS
99
99 Modifications in Sync Libraries Synchronization object Lock macro variabletimestamp LOCK (‘{ }’) lock($1.lock) … …
100
100 Modifications in Sync Libraries Synchronization object Lock macro variabletimestamp LOCK (‘{ }’) lock($1.lock) … TS = GenerateTS (TS, $1.timestamp) locktimestamp …
101
101 Modifications in Sync Libraries Synchronization object Lock macro variabletimestamp LOCK (‘{ }’) lock($1.lock) … TS = GenerateTS (TS, $1.timestamp) locktimestamp … Transparent to Application Code
102
102 Other Topics in Paper Easy to virtualize Queue Overflow Detailed HW structures
103
103 Outline Motivation Main Idea Implementation Results Conclusions
104
104 Experimental Setup PIN – Binary Instrumention Tool Default parameters Benchmarks: SPLASH2, PARSEC – # of proc: 8 – Signature size: 2 Kbits – Block size: 2,000 ins – Queue size: 16 entries – Checkpoint interval: 1 Million ins
105
105 Race Detection Ability Three configurations –SigRace Default –SigRace Ideal : Stores every signature between 2 checkpoints –ReEnact [Prvulovic et al, ISCA03]: Cache based approach with timestamp per word
106
106 Race Detection Ability AppIdeal SigRace Default SigRce ReEnact Cholesky 16 Barnes 11 6 Volrend 27 18 Ocean 111 Radiosity 15 12 Raytrace 443 Water-sp 842 Streamclus ter 131213 Total959070
107
107 Race Detection Ability More coverage than ReEnact AppIdeal SigRace Default SigRce ReEnact Cholesky 16 Barnes 11 6 Volrend 27 18 Ocean 111 Radiosity 15 12 Raytrace 443 Water-sp 842 Streamclus ter 131213 Total959070
108
108 Race Detection Ability More coverage than ReEnact Coverage comparable to ideal configuration AppIdeal SigRace Default SigRce ReEnact Cholesky 16 Barnes 11 6 Volrend 27 18 Ocean 111 Radiosity 15 12 Raytrace 443 Water-sp 842 Streamclus ter 131213 Total959070
109
109 Injected Races Removed one dynamic sync per run Each application runs 25 times with diff sync elimination
110
110 Injected Races
111
111 Injected Races More overall coverage than ReEnact
112
112 Injected Races More overall coverage than ReEnact –150% more coverage
113
113 Injected Races More overall coverage than ReEnact –150% more coverage
114
114 Conclusions Proposed SigRace: –Simple HW Cache and coherence protocol are unchanged –Higher coverage than existing HW schemes Detect races even if the line is displaced/invalidated –Usable on-the-fly in production runs SigRace finds 150% more injected races than word-based ReEnact
115
115 SigRace: Signature-Based Data Race Detection Abdullah Muzahid, Dario Suarez*, Shanxiang Qi & Josep Torrellas Computer Science Department University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu *Universidad de Zaragoza, Spain
116
116 Back Up Slides
117
117 Execution Overhead No overhead in generating signatures (HW) Additional instructions are negligible Main overheads –Checkpointing (ReVive – 6.3%) –Network traffic (63 bytes per 1000 ins - compressed) –Re-execution (depends on false positives & race position) Can be done offline
118
118 Network Traffic Overhead 63 ̴ 1 cache line
119
119 Re-execution Overhead Instructions re-executed until the first true data race is analyzed are shown as overhead In this process, it may also encounter many false positive races Instructions re-executed to analyze only the true race are shown as true overhead Instructions re-executed to filter out the false positives are shown as false overhead
120
120 Re-execution Overhead 22% Modest overhead
121
121 False Positives Parallel bloom filters with H 3 hash function 1.57% Low False Positive
122
122 Virtualization
123
123 Virtualization RDM uses as many queues as the number of threads Timestamp is accessed by thread id Thread id remains same even after migration Timestamps, flags, conflict signature are saved and restored at context switch RDM intersects incoming signatures against all other threads’(even inactive ones) signatures Threads can be re-executed without any scheduling constraints
124
124 Scalability For small # of proc., scalability is not a problem The operation of RDM can be pipelined –Simple repetitive operation Network traffic (compressed message) around 63Bytes/thousand ins Checkpoint is an issue.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.