Presentation is loading. Please wait.

Presentation is loading. Please wait.

Aditya Thakur Rathijit Sen Ben Liblit Shan Lu University of Wisconsin–Madison Workshop on Dynamic Analysis 2009 Cooperative Crug Isolation.

Similar presentations


Presentation on theme: "Aditya Thakur Rathijit Sen Ben Liblit Shan Lu University of Wisconsin–Madison Workshop on Dynamic Analysis 2009 Cooperative Crug Isolation."— Presentation transcript:

1 Aditya Thakur Rathijit Sen Ben Liblit Shan Lu University of Wisconsin–Madison Workshop on Dynamic Analysis 2009 Cooperative Crug Isolation

2 Cooperative Crug Isolation read(x) write(x) Thread 1 Thread 2 Race ! read(x) write(x) Thread 1 write(x) Thread 2 Atomicity violation! (concurrency bug)

3 Cooperative Crug Isolation threaded.exe file.in threaded.exe file.in  developer user Non-determinism! More cores More threads       More crugs

4 Cooperative Crug Isolation

5  unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2 Global variables are shown in bold. Simplified crug from PBZIP2

6 Cooperative Crug Isolation Global variables are shown in bold. Identify root cause of crug unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2

7 Cooperative Crug Isolation Not scalable, High overhead Report benign crugs Target specific type of crugs and synchronization Current techniques

8 Cooperative Crug Isolation Scalable, Low overhead Does not report benign crugs Multiple types of crugs and synchronization

9 Shipping Application Cooperative Crug Isolation Bug Isolation Program Source Compiler Sampler Predicates Counts & /  Statistical Debugging Top bugs with likely causes

10 Cooperative Crug Isolation Bug Isolation unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2 unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2 CBI predicates inadequate for crug isolation. Values of predicates same for successful and failing runs.

11 Cooperative Crug Isolation Bug Isolation unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2 CBI sampling inadequate for crug isolation. Sampling thread-local, independent.

12 Cooperative Crug Isolation Bug Isolation CBI was unable to diagnose crugs in any of the benchmarks used. No bug predictors reported!

13 Cooperative Crug Isolation CCI extends the CBI framework to target crugs  New predicate capturing interleaving events  New cross-thread sampling scheme

14 Cooperative Crug Isolation Predicate Design unlock(mut); S: lock(mut); Thread 1 mut = NULL; Thread 2 remote S is true  local S is true

15 Predicate Instrumentation At runtime, maintain hashtable which maps addresses to thread id which last accessed it AddressThread Id 0xb1ab1a1 0xf00f002 0xb1af001

16 Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock);

17 Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); curTid is thread id of currently executing thread

18 Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); Check if curTid was the thread which previously accessed x

19 Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); Set differs to true if it was not

20 Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); Update the hashtable

21 Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); Increment counter for predicate at S

22 Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); Execute block atomically

23 Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); Handles accesses through pointers. No need for static pointer analysis.

24 Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); curTid is thread id of currently executing thread Check if curTid was the thread which previously accessed x Set differs to true if it was not Increment counter for predicate at S Execute block atomically Update the hashtable

25 Sampling Mechanism access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); If(gsample == 0) access(x); gsample = curTid; insert(&x, curTid); else if(gsample == curTid) gsample = 0; clear(); Is sampling on? Turn on sampling Update hashtable Stop sampling, clear hashtable Did current thread initiate sampling Sampling not on Sampling already on

26 Sampling Mechanism lock(mut); Thread 1 AddressThread Id Hashtable gsample = 0

27 Sampling Mechanism lock(mut); Thread 1Thread 2 AddressThread Id &mut1 Hashtable gsample = 1

28 Sampling Mechanism lock(mut); Thread 1 mut = NULL; Thread 2 AddressThread Id &mut2 Hashtable gsample = 1

29 Sampling Mechanism unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2 S: AddressThread Id &x2 Hashtable gsample = 1 Record remote S is true

30 Sampling Mechanism unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2 S: AddressThread Id Hashtable gsample = 0 Stop sampling

31 Experimental Evaluation  Benchmarks used  Apache HTTP server, PBZIP2  SPLASH-2: FFT, LU  Machine used  dual-core Intel P4  Questions answered  Runtime overhead  Accuracy of predictors

32 Runtime Overhead BenchmarkNo samplingSampling Apache25%2% PBZIP2200%7% FFT650%25% LU1,300%800% Overhead compared to uninstrumented code Low overheads for both real-world applications Large difference between no sampling and sampling.

33 Predictor Accuracy PredictorFunction R: buf->outcnt += len ap_buffered_log_writer() Apache PredictorFunction R : pthread_mutex_unlock(fifo->mut); consumer_decompress() PBZIP2 remote predicate

34 Predictor Accuracy PredictorFunction R: G lobal->finishtime=finish SlaveStart() R: G lobal->initdonetime=initdone SlaveStart() R: printf(“..”,Global->transtime[0]…) main() L: malloc(2*(rootN-1)*sizeof(double)); SlaveStart() FFT PredictorFunction R: G lobal->rf=rf OneSolve() L: (Global->start).gsense=-lsense; OneSolve() LU local predicate

35 Conclusion CCI is a low-overhead, scalable approach for root cause analysis of crugs Effective on two widely-deployed applications Simple predicates are effective because of the use of statistical models

36 Next time on What other events are useful for crug isolation? Scope for static analysis to help? Other cross-thread sampling mechanisms (e.g. bursty sampling)? Crug isolation to crug tolerance? Thank you!


Download ppt "Aditya Thakur Rathijit Sen Ben Liblit Shan Lu University of Wisconsin–Madison Workshop on Dynamic Analysis 2009 Cooperative Crug Isolation."

Similar presentations


Ads by Google