Presentation is loading. Please wait.

Presentation is loading. Please wait.

HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.

Similar presentations


Presentation on theme: "HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation."— Presentation transcript:

1 HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation

2 Motivation Data race detection important S/W solutions slow (not good for production runs) Previous H/W solutions focus on happens- before relation  Cannot detect potential races

3 Motivating Example

4 Solution: HARD (h/w lockset) Challenges: – How to efficiently store and maintain lockset for each variable in hardware? – How to efficiently perform the set operation in the lockset algorithm? Main ideas (will be detailed later) – h/w bloom filter – Piggybacking on cache coherence protocols – Reset all bloom filters after exiting a barrier

5 Outline LockSet (refresh our memory) HARD Evaluation Conclusion

6 Main Lockset Algorithm Idea: accesses to every shared variable should be protected by some common lock. Data structures: – Thread t’s current lock set: L(t) – Candidate set for a variable v: C(v) Algorithms – Modify L(t) upon lock acquire and release – Initiate C(v) to be a set of all locks – When t accesses v, C(v)=C(v)  L(t) – If C(v) ==  then report violation on variable v

7 Reducing False Positives

8 Outline LockSet (refresh our memory) HARD Evaluation Conclusion

9 HARD Overview LState: exclusive, shared, etc. BFVector: candidate lock set for the cache line Lock Register: Thread’s lockset Counter Register: used for resolving hash collisions (more detail later) 2bits16bits 32bits

10 HARD Overview: Operations A lock  a ‘1’ in bloom filter Fetching a line from memory: set the BFVector to all 1s, LState to exclusive Update BFVector and LState on accesses Communicate them through coherence protocol Lock register: thread’s lock set 2b16b 32b

11 Bloom Filter Bloom filter: A bit vector that represents a set of keys – A key is hashed d (e.g. d=3) times and represented by d bits Construct: for every key in the set, set its 3 bits in vector Membership Test: given a key, check if all its 3 bits are 1 – Definitely not in the set if some bits are 0 – May have false positives 00011100011001000001 Bit 0 =H 0 (key)Bit 1 =H 1 (key)Bit 2 =H 2 (key) Filter

12 Representing LockSet as Bloom Filter 4 hash functions Lockset Intersection: bloom filter intersection Lockset empty: any of the 4bits are all 0

13 False Negative Caused by Bloom Filter

14 Prob of False Negatives Suppose the candidate set contains m locks Given a lock, probability of recognizing it as a member: prob_whole = prob_part k prob_part = 1 – (1-1/n) m When k=4, n=4: – 0.0039 (m=1), 0.037 (m=2), 0.111 (m=3) – Paper says: “experiments show that no races were missed” But what if the thread currently holds multiple locks? n bits k parts k=4, n=4

15 If threads hold 1 to 8 locks (not in the paper) n bits =4 k parts =4 ----------------------------------------------- m=1 m=2 m=3 m=4 t=1 : 0.0039 0.0366 0.1117 0.2184 t=2 : 0.0078 0.0719 0.2109 0.3891 t=3 : 0.0117 0.1059 0.2991 0.5225 t=4 : 0.0155 0.1387 0.3774 0.6267 t=5 : 0.0194 0.1702 0.4469 0.7083 t=6 : 0.0232 0.2006 0.5087 0.7720 t=7 : 0.0270 0.2299 0.5636 0.8218 t=8 : 0.0308 0.2581 0.6123 0.8607 -----------------------------------------------

16 Try another design n bits =8 k parts =8 ----------------------------------------------- m=1 m=2 m=3 m=4 t=1 : 0.0000 0.0000 0.0001 0.0009 t=2 : 0.0000 0.0000 0.0003 0.0017 t=3 : 0.0000 0.0000 0.0004 0.0026 t=4 : 0.0000 0.0000 0.0006 0.0034 t=5 : 0.0000 0.0000 0.0007 0.0043 t=6 : 0.0000 0.0001 0.0008 0.0051 t=7 : 0.0000 0.0001 0.0010 0.0060 t=8 : 0.0000 0.0001 0.0011 0.0069 -----------------------------------------------

17 Unlock operation  remove bit from bloom filter? 32 bit counter register each bloom filter bit has 2 bit counter Increment the 2-bit counter if the bloom filter bit is set Unlock: decrement the 2-bit counter, if 0, clear bloom filter bit 2b16b 32b

18 Candidate Set and LState Communications must broadcast changes to C(v) if cache line is in shared state

19 Handling Barriers Set BFVectors to all 1s after exiting a barrier (what if t2 does not hold any lock?)

20 Three Approximations Bloom filter to represent lockset Lockset info only in cache – Can only detect races in a short window of execution Cache line granularity – False sharing – Compiler to put shared variables to different lines? – Removing false sharing is generally good

21 Outline LockSet (refresh our memory) HARD Evaluation Conclusion

22 Methodology SESC: cycle-accurate execution-driven simulator (MIPS instruction set) Six SPLASH-2 benchmarks Randomly inject a data race: randomly remove a dynamic instance of lock and corresponding unlock Compare with happens-before, ideal lockset

23 Bug detected, false alarms Ideal: word-granularity, keep state in memory, perfect lockset # of false alarms is # of source code locations, dynamic errors are much more

24 Mainly bus traffic increase Note that HARD requires bloom filter operation per memory access in processor pipeline

25 Conclusion Main idea: bloom filter to represent lockset Three approximations: – Bloom filter to represent lockset – Lockset info only in cache – Cache line granularity Problems: – Lockset: false positives – Seems hard to add operations into processor pipeline – Are these the right approximations for monitoring production runs?


Download ppt "HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation."

Similar presentations


Ads by Google