Scalable Dynamic Analysis for Automated Fault Location and Avoidance Rajiv Gupta Funded by NSF grants from CPA, CSR, & CRI programs and grants from Microsoft.

Scalable Dynamic Analysis for Automated Fault Location and Avoidance Rajiv Gupta Funded by NSF grants from CPA, CSR, & CRI programs and grants from Microsoft Research

Motivation  Software bugs cost the U.S. economy about $59.5 billion each year [NIST 02].  Embedded Systems Mission Critical / Safety Critical Tasks A failure can lead to Loss of Mission/Life. (Ariane 5) arithmetic overflow led to shutdown of guidance computer. (Mars Climate Orbiter) missed unit conversion led to faulty navigation data. (Mariner I) missing superscripted bar in the specification for the guidance program led to its destruction 293 seconds after launch. (Mars Pathfinder) priority inversion error causing system reset. (Boeing 747-400) loss of engine & flight displays while in flight. (Toyota hybrid Prius) VSC, gasoline-powered engine shut off. (Therac-25) wrong dosage during radiation therapy. …….

Overview Dynamic Slicing Offline Environment Faults Online Program Execution Fault Location Fault Avoidance Scalability Tracing + Logging Long-running Multi-threaded

Fault Location Goal: Assist the programmer in debugging by automatically narrowing the fault to a small section of the code.  Dynamic Information Data dependences Control dependences Values  Execution Runs One failed execution & Its perturbations

Dynamic Information …… ExecutionProgram Dynamic Dependence Graph Data Control

Approach Detect execution of statement s such that  Faulty code Affects the value computed by s; or  Faulty code is Affected-by the value computed by s through a chain of dependences. Estimate the set of potentially faulty statements from s:  Affects: statements from which s is reachable in the dynamic dependence graph. (Backward Slice)  Affected-by: statements that are reachable from s in the dynamic dependence graph. (Forward Slice)  Intersect slices to obtain a smaller fault candidate set.

Backward & Forward Slices Erroneous Output Failure inducing Input Backward Slice [Korel&Laski,1988] Forward Slice [ASE-05]

Failure Inducing Input Erroneous Output [ASE-05]  For memory bugs the number of statements is very small (< 5). Backward & Forward Slices

Bidirectional Slices Backward Slice of CP Forward Slice of CP + Bidirectional Slice Combined Slice [ICSE-06] Found critical predicates in 12 out of 15 bugs Search for critical predicate: Brute force: 32 predicates to 155K predicates; After Filtering and Ordering: 1 to 7K predicates. Critical Predicate: An execution instance of a predicate such that changing its outcome “repairs” the program state.

Pruning Slices  Confidence in v  v C( v ): [0,1] 0 - all values of v produce same 1 - any change in v will change How? Value profiles.  1 1 1 [PLDI-06]

Test Programs Real Reported BugsInjected Bugs  Nine logical bugs (incorrect ouput) Unix utilities  grep 2.5, grep 2.5.1, flex 2.5.31, make 3.80.  Six memory bugs (program crashes) Unix utilities  gzip, ncompress, polymorph, tar, bc, tidy.  Siemens Suite (numerous versions)  schedule, schedule2, replace, print_tokens.. Unix utilities  gzip, flex

Dynamic Slice Sizes Buggy RunsBSFSBiS flex 2.5.31(a)695605 225 flex 2.5.31(b)272 257 NA flex 2.5.31(c) 50 1368NA grep 2.5NA73188 grep 2.5.1(a)NA32111 grep 2.5.1(b)NA599NA grep 2.5.1(c)NA12453 make 3.80(a)98112391372 make 3.80(b)129016461436 gzip-1.2.434339 ncompress-4.2.418230 polymorph-0.4.021322 tar 1.13.25105202117 bc 1.06204188267 tidy554367541

Combined Slices Buggy Runs BS BS^FS^BiS (%BS) flex 2.5.31(a)69527 (3.9%) flex 2.5.31(b)272102 (37.5%) flex 2.5.31(c)505 (10%) grep 2.5NA86 (7.4%*EXEC) grep 2.5.1(a)NA25 (4.9%*EXEC) grep 2.5.1(b)NA599 (53.3%*EXEC) grep 2.5.1(c)NA12 (0.9%*EXEC) make 3.80(a)981739 (81.4%) make 3.80(b)12901051 (75.3%) gzip-1.2.4343 (8.8%) ncompress-4.2.4182 (14.3%) polymorph-0.4.0213 (14.3%) tar 1.13.2510545 (42.9%) bc 1.06204102 (50%) tidy554161 (29.1%)

Evaluation of Pruning ProgramDescriptionLOCVersionsTests print_tokensLexical analyzer 56554072 print_tokens2Lexical analyzer 51054057 replacePattern replacement 56385542 schedulePriority scheduler 41232627 schedule2Priority scheduler 30732683 gzipUnix utility 800911217 flexUnix utility 124188525 Siemen’s Suite Single error is injected in each version. All the versions are not included:  No output or the very first output is wrong;  Root cause is not contained in the BS (code missing error).

ProgramBSPruned SlicePruned Slice / BS print_tokens 1103531.8% print_tokens2 1145548.2% replace 1316045.8% schedule 1177059.8% schedule2 905864.4% gzip 35712133.9% flex 727273.7% Evaluation of Pruning

Effectiveness Backward Slice [AADEBUG-05] ≈ 31% of Executed Statements Combined Slice [ASE-05,ICSE-06] ≈ 36% of Backward Slice ≈ 11% of Exec. Erroneous output Failure inducing input Critical predicate Pruned Slice [PLDI-06] ≈ 41% of Backward Slice ≈ 13% of Exec. Confidence Analysis

Effectiveness  Slicing is effective in locating faults.  No more than 10 static statements had to be inspected. Program-bugInspected Stmts. mutt – heap overflow8 pine – stack overflow3 pine – heap overflow10 mc – stack overflow2 squid – heap overflow5 bc – heap overflow3

Execution Omission Errors X= =X A = A<0 X= =X A = A<0 X= =X A = A<0 Inspect pruned slice. Dynamically detect an Implicit dependence. Incrementally expand the pruned slice. [PLDI-07] Implicit dependence

Scalability of Tracing Dynamic Information Needed Dynamic Dependences  for all slicing Values for Confidence Analysis  for pruning slices  annotates the static program representation Whole Execution Trace (WET)  Trace Size ≈ 15 Bytes / Instruction

Trace Sizes & Collection Overheads  Trace sizes are very large for even 10s of execution. ProgramRunning Time Dep. Trace Collection Time mysql13 s21 GB2886 s prozilla8 s6 GB2640 s proxyC10 s456 MB880 s mc10 s55 GB418 s mutt20 s388 GB3238 s pine14 s156 GB2088 s squid15 s88 GB1132 s

Compacting Whole Execution Traces  Explicitly remember dynamic control flow trace.  Infer as many dynamic dependences as possible from control flow (94%), remember the remaining dependences explicitly (≈ 6%).  Specialized graph representation to enable inference.  Explicitly remember value trace.  Use context-based method to compress dynamic control flow, value, and address trace.  Bidirectional traversal with equal ease [MICRO-04, TACO-05]

Input: N=2 5 1 : for I=1 to N do 6 1 : if (i%2==0) then 7 1 : p=&a 8 1 : a=a+1 9 1 : z=2*(*p) 10 1 : print(z) 1 1 : z=0 2 1 : a=0 3 1 : b=2 4 1 : p=&b 5 2 : for I=1 to N do 6 2 : if (i%2==0) then 8 2 : a=a+1 9 2 : z=2*(*p) 1: z=0 2: a=0 3: b=2 4: p=&b 5: for i = 1 to N do 6: if ( i %2 == 0) then 7: p=&a endif endfor 8: a=a+1 9: z=2*(*p) 10: print(z) Dependence Graph Representation

5:for i=1 to N 6:if (i%2==0) then 7: p=&a 8: a=a+1 9: z=2*(*p) 10: print(z) T F 1: z=0 2: a=0 3: b=2 4: p=&b T Input: N=2 1 1 : z=0 2 1 : a=0 3 1 : b=2 4 1 : p=&b 5 1 : for i = 1 to N do 6 1 : if ( i %2 == 0) then 8 1 : a=a+1 9 1 : z=2*(*p) 5 2 : for i = 1 to N do 6 2 : if ( i %2 == 0) then 7 1 : p=&a 8 2 : a=a+1 9 2 : z=2*(*p) 10 1 : print(z) T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 F Dependence Graph Representation

1 2 3 45 6 78 9 10 1 2 3 4 5 6 7 2 1 3467934679 3567935679 3468934689 3568935689 1 2 3 2 1 3467934679 1 2 3 3 45 6 78 9 Transform: Traces of Blocks

Infer: Local Dependence Labels X = Y= X X = Y= X (10,10) (20,20) (30,30) 10,20,30 X = Y= X =Y 21 (20,21)... (...,20)...

Transform: Local Dep. Labels X = *P = Y= X X = *P = Y= X (10,10) (20,20) 10,20 X = *P = Y= X (20,20)

Transform: Local Dep. Labels X = *P = Y= X X = *P = Y= X (10,10) (20,20) X = *P = Y= X 10,20 X = *P = Y= X 2010 11,21 (10,11) (20,21) =Y 11,21 =Y (20,21) (10,11)

Group: Non-Local Dep. Edges X = Y = = Y = X X = Y = X = Y = = Y = X X = Y = (10,21) (20,11) X = Y = = Y = X X = Y = (20,11) (10,21) 11,21 10 20

Compacted WET Sizes Program Statements Executed (Millions) WET Size (MB)Before / After BeforeAfter 300.twolf 256.bzip2 255.vortex 197.parser 181.mcf 164.gzip 130.li 126.gcc 099.go Average 690 751 609 615 715 650 740 365 685 647 10,666 11,921 8,748 8,730 10,541 9,688 10,399 5,238 10,369 9,589 647 221 105 188 416 538 203 89 575 331 16.49 54.02 83.63 46.34 25.33 18.02 51.22 58.84 18.04 41.33 ≈ 4 Bits / Instruction

[PLDI-04] vs. [ICSE-03] Slicing Times

Dep. Graph Generation Times  Offline post-processing after collecting address and control flow traces  ≈ 35x of execution time  Online techniques [ICSM 2007]  Information Flow: 9x to18x slowdown  Basic block Opt.: 6x to10x slowdown  Trace level Opt.: 5.5x to 7.5x slowdown  Dual Core: ≈1.5x slowdown  Online Filtering techniques  Forward slice of all inputs  User-guided bypassing of functions

Reducing Online Overhead  Record non-deterministic events online Less than 2x overhead Deterministic replay of executions  Trace faulty executions off-line Replay the execution Switch on tracing Collect and inspect traces  Trace analysis is still a problem The traces correspond to huge executions Off-line overhead of trace collection is still significant

Reducing Trace Sizes Checkpointing Schemes  Trace from the most recent checkpoint Checkpoints are of the order of minutes. Better but the trace sizes are still very large. Exploiting Program Characteristics  Multithreaded and server-like [ISSTA-07, FSE-06] Examples : mysql, apache. Each request spawns a new thread. Do not trace irrelevant threads.

Beyond Tracing  Checkpoint: capture memory image.  Execute and Record (log) Events.  Upon Crash, Rollback to checkpoint.  Reduce log and Replay execution using reduced log.  Turn on tracing during replay. x Checkpointlog x Trace Reduced log  Applicable to Multithreaded Programs [ISSTA-07]

An Example  A mysql bug “ load …” command will crash the server if database is not specified Without typing “use database_name”, thd->db is Null.

Example – Execution and log file open path=/etc/my.cnf … Wait for connection Create Thread 1 Wait for command Create Thread 2 Wait for command Recv “show databases” Handle command Recv “load data …” Handle -- ( server crashes ) Recv “use test; select * from b” Handle command Run mysql server User 1 connects to the server User 2 connects to the server User 1: “show databases” User 2: “use test” “ select * from b” User 1: “load data into table1” Time Blue – T0 Red – T1 Green – T2 Gray - Scheduler

Execution Replay using Reduced log open path=/etc/my.cnf … Wait for connection Create Thread 1 Recv “load data …” Handle -- ( server crashes ) Run mysql server User 1 connects to the server User 2 connects to the server User 1: “show databases” User 2: “show databases” “ select * from b” User 1: “load data into table1” Time

Execution Reduction  Effects of Reduction  Irrelevant Threads  Replay-only vs. Replay & Trace  How? By identifying Inter-thread Dependences  Event Dependences - found using the log  File Dependences - found using the log  Shared-Memory Dependences - found using replay  Space requirement reduced by 4x  Time requirement reduced by 2x  Naïve approach requires thread id of last writer of each address  Space and time efficient detection o Memory Regions: Non-shared vs shared o Locality of References to Regions

Experimental Results

Original Optimized Program-bug Trace Sizes Num. of dependences Experimental Results

Orig. OPT. Program-bug Execution Times (seconds) Logging Experimental Results

Debugging System Slicing Module WET Slices Application binary Execution Engine Valgrind InputOutput Traces Instrument code Compressed Trace Static Binary Analyzer Diablo Control Dependence Reduced Log Record Replay Jockey Checkpoint + log

Fault Avoidance Large number of faults in server programs are caused by the environment. 56 % of faults in Apache server. Types of Faults Handled  Atomicity Violation Faults. Try alternate scheduling decisions.  Heap Buffer Overflow Faults. Pad memory requests.  Bad User Request Faults. Drop bad requests. Avoidance Strategy  Recover first time, Prevent later. Record the change that avoided the fault.

Experiments ProgramType of BugEnv. Change# of Trials Time taken (secs.) mysql-1Atomicity Violn.Scheduler1130 mysql-2Atomicity Violn.Scheduler165 mysql-3Atomicity Violn.Scheduler165 mysql-4Buffer Overflow.Mem. Padding1700 pine-1Buffer Overflow.Mem. Padding1325 pine-2Buffer Overflow.Mem. Padding1270 mutt-1Bad User Req.Drop Req.3205 bc-1Bad User Req.Drop Req.3290 bc-2Bad User Req.Drop Req.3195

Summary Dynamic Slicing Offline Environment Faults Online Program Execution Fault Location Fault Avoidance Scalability Tracing + Logging Long-running Multi-threaded

Dissertations Xiangyu Zhang, Purdue University  Fault Location Via Precise Dynamic Slicing, 2006. SIGPLAN Outstanding Doctoral Dissertation Award Sriraman Tallam, Google  Fault Location and Avoidance in Long-Running Multithreaded Programs, 2007.

Fault Location via State Alteration CS 206 Fall 2011

48 Value Replacement: Overview INPUT: Faulty program and test suite (1+ failing runs) TASK: (1) Perform value replacements in failing runs (2) Rank program statements according to collected information OUTPUT: Ranked list of program statements  Aggressive state alteration to locate faulty program statements [Jeffrey et. al., ISSTA 2008]

49 Alter State by Replacing Values Passing ExecutionFailing Execution Correct Output Failing Execution: Altered State ERROR REPLACE VALUES ERROR Incorrect Output Correct? / Incorrect?

50 1: read (x, y); 2: a := x - y; 3: if (x < y) 4: write (a); else 5: write (a + 1); Example of a Value Replacement (output: ?)PASSING EXECUTION: 1 10 0 1 1 (F) 1 2 3 5 1 1 1

51 Example of a Value Replacement FAILING EXECUTION:(expected output: 1) (actual output: ?) 1: read (x, y); 2: a := x + y; 3: if (x < y) 4: write (a); else 5: write (a + 1); ERROR: plus should be minus 1 1 1 12 2 1 1 (F) 1 2 3 5 ERROR 3

52 Example of a Value Replacement STATE ALTERATION: 1: read (x, y); 2: a := x + y; 3: if (x < y) 4: write (a); else 5: write (a + 1); 1 1 1 2 1 12 ERROR (expected output: 1) (actual output: ?) Original Values 0 11 Alternate Values REPLACE VALUES 1 1 0 1 (T) 3 4 Interesting Value Mapping Pair (IVMP): Location: statement 2, instance 1 Original: {a = 2, x = 1, y = 1} Alternate: {a = 1, x = 0, y = 1}

53 Searching for IVMPs in a Failing Run  Step 1: Compute the Value Profile Set of values used at each statement with respect to all available test case executions  Step 2: Replace values to search for IVMPs For each statement instance in failing run For each alternate set of values in value profile Replace values to see if an IVMP is found Endfor

54 Searching for IVMPs: Example 1: read (x, y); 2: a := x + y; // + should be – 3: if (x < y) 4: write (a); else 5: write (a + 1); Test Case (x, y) Actual Output Expected Output (1, 1)31 (-1, 0) (0, 0)11 x = 1 y = 1 \ -1 \ 0 VALUE REPLACEMENT RESULTING OUTPUT IVMP? 1: read (x, y); VALUE PROFILE a = 2 output = 3 x = 0 y = 0 branch = F x = -1 y = 0 branch = T x = 1 y = 1 branch = F x = 0 y = 0 a = 0 x = -1 y = 0 a = -1 x = 1 y = 1 a = 2 x = 0 y = 0 x = -1 y = 0 x = 1 y = 1 1: 2: 3: 4: 5: a = -1 output = -1 a = 0 output = 1 (1,1)(-1,0) (0,0)

55 Searching for IVMPs: Example 1: read (x, y); 2: a := x + y; // + should be – 3: if (x < y) 4: write (a); else 5: write (a + 1); Test Case (x, y) Actual Output Expected Output (1, 1)31 (-1, 0) (0, 0)11 1: read (x, y); IVMPs Identified: stmt 1, inst 1: ( {x=1, y=1}  {x=0, y=0} ) x = 1 y = 1 \ 0 1 VALUE REPLACEMENT RESULTING OUTPUT IVMP? a = 2 output = 3 x = 0 y = 0 branch = F x = -1 y = 0 branch = T x = 1 y = 1 branch = F x = 0 y = 0 a = 0 x = -1 y = 0 a = -1 x = 1 y = 1 a = 2 x = 0 y = 0 x = -1 y = 0 x = 1 y = 1 1: 2: 3: 4: 5: VALUE PROFILE a = -1 output = -1 a = 0 output = 1

56 Searching for IVMPs: Example 1: read (x, y); 2: a := x + y; // + should be – 3: if (x < y) 4: write (a); else 5: write (a + 1); Test Case (x, y) Actual Output Expected Output (1, 1)31 (-1, 0) (0, 0)11 IVMPs Identified: stmt 1, inst 1: ( {x=1, y=1}  {x=0, y=0} ) 2: a := x + y; stmt 2, inst 1: ( {x=1, y=1, a=2}  {x=0, y=0, a=0} ) x = 1 y = 1 a = 2 \ -1 \ 0 \ -1 x = 1 y = 1 a = 2 \ 0 a = 2 output = 3 x = 0 y = 0 branch = F x = -1 y = 0 branch = T x = 1 y = 1 branch = F x = 0 y = 0 a = 0 x = -1 y = 0 a = -1 x = 1 y = 1 a = 2 x = 0 y = 0 x = -1 y = 0 x = 1 y = 1 1: 2: 3: 4: 5: VALUE PROFILE a = -1 output = -1 a = 0 output = 1

57 Searching for IVMPs: Example 1: read (x, y); 2: a := x + y; // + should be – 3: if (x < y) 4: write (a); else 5: write (a + 1); Test Case (x, y) Actual Output Expected Output (1, 1)31 (-1, 0) (0, 0)11 IVMPs Identified: stmt 1, inst 1: ( {x=1, y=1}  {x=0, y=0} ) 3: if (x < y) stmt 2, inst 1: ( {x=1, y=1, a=2}  {x=0, y=0, a=0} ) x = 1 y = 1 branch = F \ -1 \ 0 \ T x = 1 y = 1 branch = F \ 0 \ F a = 2 output = 3 x = 0 y = 0 branch = F x = -1 y = 0 branch = T x = 1 y = 1 branch = F x = 0 y = 0 a = 0 x = -1 y = 0 a = -1 x = 1 y = 1 a = 2 x = 0 y = 0 x = -1 y = 0 x = 1 y = 1 1: 2: 3: 4: 5: VALUE PROFILE a = -1 output = -1 a = 0 output = 1

58 Searching for IVMPs: Example 1: read (x, y); 2: a := x + y; // + should be – 3: if (x < y) 4: write (a); else 5: write (a + 1); Test Case (x, y) Actual Output Expected Output (1, 1)31 (-1, 0) (0, 0)11 IVMPs Identified: stmt 1, inst 1: ( {x=1, y=1}  {x=0, y=0} ) 5: write (a + 1); stmt 2, inst 1: ( {x=1, y=1, a=2}  {x=0, y=0, a=0} ) stmt 5, inst 1: ( {a=2, output=3}  {a=0, output=1} ) a = 2 output = 3 \ 0 \ 1 a = 2 output = 3 x = 0 y = 0 branch = F x = -1 y = 0 branch = T x = 1 y = 1 branch = F x = 0 y = 0 a = 0 x = -1 y = 0 a = -1 x = 1 y = 1 a = 2 x = 0 y = 0 x = -1 y = 0 x = 1 y = 1 1: 2: 3: 4: 5: VALUE PROFILE a = -1 output = -1 a = 0 output = 1

59 Searching for IVMPs: Example 1: read (x, y); 2: a := x + y; // + should be – 3: if (x < y) 4: write (a); else 5: write (a + 1); Test Case (x, y) Actual Output Expected Output (1, 1)31 (-1, 0) (0, 0)11 IVMPs Identified: stmt 1, inst 1: ( {x=1, y=1}  {x=0, y=0} ) stmt 2, inst 1: ( {x=1, y=1, a=2}  {x=0, y=0, a=0} ) stmt 5, inst 1: ( {a=2, output=3}  {a=0, output=1} ) DONE a = 2 output = 3 x = 0 y = 0 branch = F x = -1 y = 0 branch = T x = 1 y = 1 branch = F x = 0 y = 0 a = 0 x = -1 y = 0 a = -1 x = 1 y = 1 a = 2 x = 0 y = 0 x = -1 y = 0 x = 1 y = 1 1: 2: 3: 4: 5: VALUE PROFILE a = -1 output = -1 a = 0 output = 1

60 IVMPs at Non-Faulty Statements  Causes of IVMPs at non-faulty statements Statements in same dependence chain Coincidence  Consider multiple failing runs Stmt w/ IVMPs in more runs  more likely to be faulty Stmt w/ IVMPs in fewer runs  less likely to be faulty

61 {1, 2} {4, 5} {3} MOST LIKELY TO BE FAULTY LEAST LIKELY TO BE FAULTY Multiple Failing Runs: Example 1: read (x, y); 2: a := x + y; 3: if (x < y) 4: write (a); else 5: write (a + 1); Test Case (x, y)Actual OutputExpected Output (1, 1)31 (0, 1)1 (-1, 0) (0, 0)11 [A] [B] [C] [D] Test Case [A] IVMPs: stmt 1, inst 1: ( {x=1, y=1}  {x=0, y=1} ) stmt 1, inst 1: ( {x=1, y=1}  {x=0, y=0} ) stmt 2, inst 1: ( {x=1, y=1, a=2}  {x=0, y=1, a=1} ) stmt 2, inst 1: ( {x=1, y=1, a=2}  {x=0, y=0, a=0} ) stmt 5, inst 1: ( {a=2, output=3}  {a=0, output=1} ) stmts with IVMPs: {1, 2, 5} 1: read (x, y); 2: a := x + y; 5: write (a + 1); Test Case [B] IVMPs: stmt 1, inst 1: ( {x=0, y=1}  {x=-1, y=0} ) stmt 2, inst 1: ( {x=0, y=1,a=1}  {x=-1, y=0,a=-1} ) stmt 4, inst 1: ( {a=1, output=1}  {a=-1, output=-1} ) stmts with IVMPs: {1, 2, 4} 2: a := x + y; 4: write (a); 1: read (x, y);

62 Ranking Statements using IVMPs  Sort in decreasing order of:  Break ties using Tarantula technique [Jones et. al., ICSE 2002] The number of failing runs in which the statement is associated with at least one IVMP fraction of failing runs exercising stmt fraction of passing runs exercising stmt fraction of failing runs exercising stmt +

63 Techniques Evaluated  Value Replacement technique Consider all available failing runs (ValRep-All) Consider only 2 failing runs (ValRep-2) Consider only 1 failing run (ValRep-1)  Tarantula technique (Tarantula) Consider all available test cases Most effective technique known for our benchmarks  Only rank statements exercised by failing runs

64  Score for each ranked statement list  Represents percentage of statements that need not be examined before error is located  Higher score is better Metric for Comparison size of list rank of the faulty stmt 100% x size of list High Suspiciousness Low Suspiciousness High Suspiciousness Low Suspiciousness Higher Score Lower Score

65 Benchmark Programs ProgramLOC# Faulty Ver.Avg. Suite Size (Pool Size) tcas1384117 (1608) totinfo3462315 (1052) sched299920 (2650) sched2297917 (4130) ptok402717 (4130) ptok2483923 (4115) replace5163129 (5542)  129 faulty programs (errors) derived from 7 base programs  Each faulty program is associated with a branch-coverage adequate test suite containing at least 5 failing and 5 passing test cases  Test suite used by Value Replacement, test pool used by Tarantula

66 Effectiveness Results Number (%) of faulty programs ScoreValRep-AllVal-Rep-2ValRep-1 ≥ 99%23 (17.8%)21 (16.3%)18 (14.0%) ≥ 90%89 (69.0%)84 (65.1%)75 (58.1%) Value Replacement technique

67 Effectiveness Results Comparison to Tarantula Number (%) of faulty programs ScoreValRep-AllVal-Rep-2ValRep-1Tarantula ≥ 99%23 (17.8%)21 (16.3%)18 (14.0%)7 (5.4%) ≥ 90%89 (69.0%)84 (65.1%)75 (58.1%)48 (37.2%)

68 Value Replacement: Summary  Highly Effective Precisely locates 39 / 129 errors (30.2%) Most effective previously known: 5 / 129 (3.9%)  Limitations Can require significant computation time to search for IVMPs Assumes multiple failing runs are caused by the same error

69 Handling Multiple Errors  Effectively locate multiple simultaneous errors [Jeffrey et. al., ICSM 2009] Iteratively compute a ranked list of statements to find and fix one error at a time Three variations of this technique  MIN: minimal computation; use same list each time  FULL: full computation; produce new list each time  PARTIAL: partial computation; revise list each time

70 Multiple-Error Techniques Value Replacement Faulty Program and Test Suite Ranked List of Program Statements Developer Find/Fix Error Done Single Error Faulty Program and Test Suite Ranked List of Program Statements Done Failing Run Remains? No Yes Multiple Errors (MIN) Value Replacement Developer Find/Fix Error

71 Multiple-Error Techniques Faulty Program and Test Suite Ranked List of Program Statements Done Failing Run Remains? No Yes Multiple Errors (FULL) Value Replacement Developer Find/Fix Error Faulty Program and Test Suite Ranked List of Program Statements Done Failing Run Remains? No Yes Multiple Errors (PARTIAL) Partial Value Replacement Developer Find/Fix Error

72 PARTIAL Technique  Step 1: Initialize ranked lists and locate first error For each statement s, compute a ranked list by considering only failing runs exercising s Report ranked list with highest suspiciousness value at the front of the list  Step 2: Iteratively revise ranked lists and locate each remaining error For each remaining failing run that exercises the statement just fixed, recompute IVMPs Update any affected ranked lists Report ranked list with the most different elements at the front of the list, compared to previously-selected lists

73 PARTIAL Technique: Example 1 2 34 5 Program (2 faulty statements) Failing RunExecution Trace Statements with IVMPs A(1, 2, 3, 5){2, 5} B(1, 2, 3, 5){1, 2} C(1, 2, 4, 5){2, 4, 5} Computed Ranked Lists: (statement suspiciousness ) 1 2 3 4 5 2 3, 5 2, 1 1, 4 1, 3 0 2 2, 1 1, 5 1, 3 0, 4 0 2 1, 4 1, 5 1, 1 0, 3 0 2 3, 5 2, 1 1, 4 1, 3 0 [based on runs A, B, C] [based on runs A, B] [based on run C] [based on runs A, B, C] Report list 1, 2, or 5 (assume 1)  Fix faulty statement 2

74 PARTIAL Technique: Example 1 2 34 5 Program (1 faulty statement) Failing RunExecution Trace Statements with IVMPs C(1, 2, 4, 5){4} Computed Ranked Lists: (statement suspiciousness ) 2 3 4 5 2 2, 1 1, 4 1, 5 1, 3 0 2 2, 1 1, 5 1, 3 0, 4 0 4 1, 1 0, 2 0, 3 0, 5 0 2 2, 1 1, 4 1, 5 1, 3 0 [based on runs A, B, C] (C updated) [based on runs A, B] (no updates) [based on run C] (C updated) [based on runs A, B, C] (C updated) Report list 4  Fix faulty statement 4  Done

75 Techniques Compared  (MIN) Only compute ranked list once  (FULL) Fully recompute ranked list each time  (PARTIAL) Compute IVMPs for subset of failing runs and revise ranked lists each time  (ISOLATED) Locate each error in isolation

76 Benchmark Programs Program# 5-Error Faulty Versions Average Suite Size (# Failing Runs / # Passing Runs) tcas2011 (5 / 6) totinfo2022 (10 / 12) sched2029 (10 / 19) sched22030 (9 / 21) ptok232 (8 / 24) ptok21129 (5 / 24) replace2038 (9 / 29) Each faulty program contains 5 seeded errors, each in a different stmt Each faulty program is associated with a stmt-coverage adequate test suite such that at least one failing run exercises each error Experimental Benchmark Programs

77 50 55 60 65 70 75 80 85 90 tcas totinfo sched sched2 ptok ptok2 replace Effectiveness Comparison of Value Replacement Techniques Isolated Full Partial Min Effectiveness Results Avg. Score per Ranked List (%)

78 Efficiency of Value Replacement  Searching for IVMPs is time-consuming  Lossy techniques Reduce search space for finding IVMPs May result in some missed IVMPs Performed for single-error benchmarks  Lossless techniques Only affect the efficiency of implementation Result in no missed IVMPs Performed for multi-error benchmarks 5 failing runs X 50,000 stmt instances per run 15 alt value sets per instance X = 3.75 million value replacement program executions  Over 10 days if each execution requires a quarter-second

79 Lossy Techniques  Limit considered statement instances Find IVMP  skip all subsequent instances of the same statement in the current run Don’t find IVMP  skip statement in subsequent runs  Limit considered alternate value sets only use min, and max >, as compared to original value orig max <min >min <max > (skip)

80 Lossless Techniques stmt instance 1 stmt instance 2 stmt instance 3 Original Execution (assume 2 alternate value sets at each stmt instance) Regular Value Replacement Executions (value replacements are independent of each other) (portions of original execution are duplicated multiple times) (x 6) (x 4) (x 2) Efficiency Improvements: (1)Fork child process to do each value replacement in original failing execution (2)Perform value replacements in parallel

81 Lossless Techniques With Redundant Execution Removed (no duplication of any portion of original execution) With Parallelization (total time required to perform all value replacements is reduced)

82 Search Reduction by Lossy Techniques Reduction in # of Executions by Lossy Techniques: Single-Error Benchmarks # val replacements needed FullLimited Mean2.0 M0.03 M Max21.5 M0.4 M

83 Search Reduction by Lossy Techniques Reduction in # of Executions by Lossy Techniques: Single-Error Benchmarks On average, total number of executions reduced by a factor of 67

84 Time Required for Reduced Search Time Required to Search using Lossy Techniques: Single-Error Benchmarks Mean55.6 min < 1 min39% of progs < 10 min60% of progs < 100 min87% of progs Max846.5 min

85 Time Required for Reduced Search Time Required to Search using Lossy Techniques: Single-Error Benchmarks Only 13% of faulty programs required more than 100 minutes of IVMP search time

86 0 100 200 300 400 500 600 tcas totinfo sched sched2 ptok ptok2 replace Time to Search in Each Faulty Program using Lossless Techniques: Multi-Error Benchmarks Full Partial Min Time Required with Lossless Techniques Avg. Time (seconds) With Lossless techniques, multiple errors in a program can be located in minutes. With Lossy techniques, some single errors require hours to locate.

87 Execution Suppression  Efficient location of memory errors through targeted state alteration [Jeffrey et. al., TOPLAS 2010]  Alter state in a way that will definitely get closer to the goal each time  Goal: identify first point of memory corruption in a failing execution

88 Memory Errors and Corruption  Memory errors Buffer overflow Uninitialized read Dangling pointer Double free Memory leak  Memory corruption Incorrect memory location is accessed, or Incorrect value is assigned to a pointer variable

89 Study of Memory Corruption Traversal of error  First point of memory corruption  Failure ProgramLOCMemory Error Type Analyzed Input Types gzip6.3 KGlobal overflowNo crashCrash 1 man10.8 KGlobal overflowCrash 1 bc10.7 KHeap overflowNo crashCrash 1 pine211.9 KHeap overflowNo crashCrash 1 mutt65.9 KHeap overflowNo crashCrash 1 ncompress1.4 KStack overflowNo crashCrash 1Crash 2 polymorph1.1 KStack overflowNo crashCrash 1Crash 2 xv69.2 KStack overflowNo crashCrash 1 tar28.4 KNULL dereferenceCrash 1 tidy35.9 KNULL dereferenceCrash 1 cvs104.1 KDouble freeCrash 1

90 Observations from Study  Total distance from point of error traversal until failure can be large  Different inputs triggering memory corruption may result in different crashes or no crashes  Distance from error traversal to first memory corruption, is considerably less than distance from first memory corruption to failure

91 Execution Suppression: High-Level  Program crash reveals memory corruption  Key: assume memory corruption leads to crash  Component 1: suppression Iteratively identify first point of memory corruption Omit the effect of certain statements during execution  Component 2: variable re-ordering Expose crashes where they may not occur Helpful since key assumption does not always hold

92 Suppression: How it Works While a crash occurs Identify accessed location L directly causing crash Identify last definition D of location L Re-execute program and omit execution of D and anything dependent on it Endwhile Report the statement associated with the most recent D First point of memory corruption

93 Suppression: Example 1: int *p1 = &x[1]; 2: int *p2 = &x[0]; 3: int *q1 = &y[1]; 4: int *q2 = &x[0]; 5: *p1 = readInt(); 6: *p2 = readInt(); 7: *q1 = readInt(); 8: *q2 = readInt(); 9: int a = *p1 + *p2; 10: int b = *q1 + *q2; 11: int c = a + b + 1; 12: intArray[c] = 0; 13: structArray[*p2]->f = 0; 14: free(p2); 15: free(q2); Stmt 4: copy-paste error: “x” should be “y” Stmt 8: clobbers def @ stmt 6 Stmts 9 - 11: propagation Stmt 12: potential buffer overflow Stmt 13: potential overflow or NULL dereference Stmt 15: double free

94 Suppression: Example Stmt 4: The error as well as the first point of memory corruption (Located in 4 executions) 1: int *p1 = &x[1]; 2: int *p2 = &x[0]; 3: int *q1 = &y[1]; 4: int *q2 = &x[0]; 5: *p1 = readInt(); 6: *p2 = readInt(); 7: *q1 = readInt(); 8: *q2 = readInt(); 9: int a = *p1 + *p2; 10: int b = *q1 + *q2; 11: int c = a + b + 1; 12: intArray[c] = 0; 13: structArray[*p2]->f = 0; 14: free(p2); 15: free(q2);

95 Example: Execution 1 of 4 1: int *p1 = &x[1]; 2: int *p2 = &x[0]; 3: int *q1 = &y[1]; 4: int *q2 = &x[0]; 5: *p1 = readInt(); 6: *p2 = readInt(); 7: *q1 = readInt(); 8: *q2 = readInt(); 9: int a = *p1 + *p2; 10: int b = *q1 + *q2; 11: int c = a + b + 1; 12: intArray[c] = 0; 13: structArray[*p2]->f = 0; 14: free(p2); 15: free(q2); Stmt: Loc Defined: OK? 1 p1 2 p2 3 q1 5 *p1 6 *p2 7 *q1 4 q2 8 *p2/*q2 9 a 10 b 11 c 12 CRASH Action: Suppress definition of c at stmt 11 and all of its effects

96 Example: Execution 2 of 4 1: int *p1 = &x[1]; 2: int *p2 = &x[0]; 3: int *q1 = &y[1]; 4: int *q2 = &x[0]; 5: *p1 = readInt(); 6: *p2 = readInt(); 7: *q1 = readInt(); 8: *q2 = readInt(); 9: int a = *p1 + *p2; 10: int b = *q1 + *q2; 11: int c = a + b + 1; 12: intArray[c] = 0; 13: structArray[*p2]->f = 0; 14: free(p2); 15: free(q2); Stmt: Loc Defined: OK? 1 p1 2 p2 3 q1 5 *p1 6 *p2 7 *q1 4 q2 8 *p2/*q2 9 a 10 b 11 12 13 CRASH Action: Suppress def of *p2/*q2 at stmt 8 and all of its effects

97 Example: Execution 3 of 4 1: int *p1 = &x[1]; 2: int *p2 = &x[0]; 3: int *q1 = &y[1]; 4: int *q2 = &x[0]; 5: *p1 = readInt(); 6: *p2 = readInt(); 7: *q1 = readInt(); 8: *q2 = readInt(); 9: int a = *p1 + *p2; 10: int b = *q1 + *q2; 11: int c = a + b + 1; 12: intArray[c] = 0; 13: structArray[*p2]->f = 0; 14: free(p2); 15: free(q2); 1 p1 2 p2 3 q1 5 *p1 6 *p2 7 *q1 4 q2 Action: Suppress q2 at stmt 4 and effects 8 9 10 11 12 13 14 15 CRASH Stmt: Loc Defined: OK?

98 Example: Execution 4 of 4 1: int *p1 = &x[1]; 2: int *p2 = &x[0]; 3: int *q1 = &y[1]; 4: int *q2 = &x[0]; 5: *p1 = readInt(); 6: *p2 = readInt(); 7: *q1 = readInt(); 8: *q2 = readInt(); 9: int a = *p1 + *p2; 10: int b = *q1 + *q2; 11: int c = a + b + 1; 12: intArray[c] = 0; 13: structArray[*p2]->f = 0; 14: free(p2); 15: free(q2); 1 p1 2 p2 3 q1 5 *p1 6 *p2 7 *q1 4 Result: stmt 4 identified 8 9 10 11 12 13 14 15 Stmt: Loc Defined: OK?

99 Example: Summary 1 2 3 4 5 6 7 8 9 10 11 Execution 1 p1: p2: q1: q2: p1: p2: q1: p2, q2: * * * ** a: b: c: 12 CRASH! 1 2 3 4 5 6 7 8 9 10 11 12 p1: p2: q1: q2: p1: p2: q1: p2, q2: * * * ** a: b: 13 CRASH! Execution 2 suppress 1 2 3 4 5 6 7 8 9 10 11 12 13 14 p1: p2: q1: q2: p1: p2: q1: * * * Execution 3 suppress 15 CRASH! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 p1: p2: q1: p1: p2: q1: * * * Execution 4 suppress 15 REPORT 4

100 Variable Re-Ordering  Re-order variables in memory prior to execution  Try to cause a crash due to corruption, in cases where crash does not occur  Can overcome limitations of suppression Do not terminate prematurely when corruption does not cause crash Applicable to executions that do not crash  Position address variables after buffers

101 Variable Re-Ordering: Example From program ncompress: void comprexx(char **fileptr) { int fdin; int fdout; char tempname[1024]; strcpy(tempname, *fileptr);... } tempname fdout fdin $ return addr On the call stack: Original Variable Ordering: Variable Re-Ordering: tempname fdout fdin $ return addr Original ordering  no stack smash Re-ordering  stack smash

102  The Complete Algorithm  Explanation Execution Suppression Algorithm exec := original failing execution; Do (A) identifiedStmt, exec := run suppression component using exec; (B) reordering, exec := run variable re-ordering component using exec; While (crashing reordering is found); Report identifiedStmt; (A) Runs suppression until no further crashes occur (B) Attempts to expose an additional crash Do/While loop iterates as long as variable re-ordering exposes a new crash

103 Execution Suppression: Evaluation Suppression-only results (no variable re-ordering): ProgramInput Type # Exec. Required Maximum Static Dependence Distance From Located Stmt To… 1 st Memory CorruptionError gzipCrash 1200 manCrash 1212 bcCrash 1201 pineCrash 1205 muttCrash 1301 ncompressCrash 1 Crash 2 2424 0000 0000 polymorphCrash 1 Crash 2 2323 0000 1111 xvCrash 1402 tarCrash 1200 tidyCrash 1200 cvsCrash 1200

104 Execution Suppression: Evaluation Suppression and variable re-ordering results: ProgramInput Type # Crashes Exposed # Var R-O Exec. Maximum Static Dependence Distance From Located Stmt To… 1 st Memory CorruptionError gzipNo Crash015--- manCrash 111801 bcNo Crash1---01 pineNo Crash1---05 muttNo Crash1---01 ncompressNo Crash1500 polymorphNo Crash1601 xvNo Crash113502

105 Memory Errors in Multithreaded Programs  Assume programs run on single processor  Two main enhancements required for Execution Suppression Reproduce failure on multiple executions  Record failure-inducing thread interleaving  Replay same interleaving on subsequent executions  In general, other factors should be recorded/replayed Identify data race errors  Data race: concurrent, unsynchronized access of a shared memory location by multiple threads; at least one write  Identified on-the-fly during suppression

106 Identifying Data Races  Data races involve WAR, WAW, or RAW dependencies  Identified points of suppression are writes Can involve WAR or WAW dependence prior to that point Can involve RAW dependence after that point  Monitor for an involved data race on-the-fly during a suppression execution

107 On-the-fly Data Race Detection Identified suppression point (thread T 1 writes to location L) Last access to L by a thread other than T 1 Next read from L by a thread other than T 1 Monitor for synchronization on L Suppression Execution

108 On-the-fly Data Race Detection Last access to L by a thread other than T 1 Monitor for synchronization on L Suppression Execution Next read from L by a thread other than T 1 WAR or WAW data race may be identified at this point

109 On-the-fly Data Race Detection Last access to L by a thread other than T 1 Monitor for synchronization on L Suppression Execution RAW data race may be identified at this point WAR or WAW data race may be identified at this point

110 Potentially-Harmful Data Races  Given two memory accesses involved in a data race, force other thread interleavings to see if the state is altered Memory access point 1: access to L from thread T 1 Memory access point 2: access to L from thread T 2 For each ready thread besides T 1, re-execute from this point and schedule it in place of T 1 If value in L is changed at this point, data race is potentially-harmful Harmful Data Race Checking Executions

111 Evaluation with Multithreaded Programs ProgramLOCError Type# Executions Required Precisely Identifies Error? apache191 KData race3yes mysql-1508 KData race3yes mysql-2508 KData race3yes mysql-3508 KUninitialized read2yes prozilla-116 KStack overflow2yes prozilla-216 KStack overflow4yes axel3 KStack overflow3yes Multithreaded Benchmark Programs and Results:

112 Implementing Suppression: General  Global variables count: Dynamic instruction count value suppress: Suppression mode flag (boolean)  Variables associated with each register and memory word lastDef: count value associated with the instruction last defining it corrupt: whether associated effects need to be suppressed target.lastDef := ++count; At a program instruction (defines target, uses src 1 and src 2 ): Ensure instruction responsible for a crash can be identified: Carry out suppression as necessary: if (current instruction is a suppression point) suppress := true; target.corrupt := true; if (suppress) if (src 1.corrupt or src 2.corrupt) target.corrupt := true; else execute instruction; target.corrupt := false;

113 Software/Hardware Support Software-only implementation can incur relatively high overhead (SW) Reduce overhead with hardware support Existing support in Itanium processors for deferred exception handling: extra bit for registers (HW1) Further memory augmentation: extra bit for memory words (HW1 + HW2) Overheads compared in a simulator

114 Average Overhead: 7.2x (SW) 2.7x (HW1) 1.8x (HW1 + HW2) Performance Overhead Comparison

115 Other Dynamic Error Location Techniques  Other state-alteration techniques Delta Debugging [Zeller et al. FSE 2002, TSE 2002, ICSE 2005]  Search in space for values relevant to a failure  Search in time for failure cause transitions Predicate Switching [Zhang et. al. ICSE 2006]  Alter predicate outcome to correct failing output  Value Replacement is more aggressive; Execution Suppression is better targeted for memory errors

116 Other Dynamic Error Location Techniques  Program slicing-based techniques Pruning Dynamic Slices with Confidence [Zhang et. al. PLDI 2006] Failure-Inducing Chops [Gupta et. al. ASE 2005]  Invariant-based techniques Daikon [Ernst et. al. IEEE TSE Feb. 2001] AccMon [Zhou et. al. HPCA 2007]

117 Other Dynamic Error Location Techniques  Statistical techniques Cooperative Bug Isolation [Ben Liblit doctoral dissertation, 2005] SOBER [Liu et. al. FSE 2005] Tarantula [Jones et. al. ICSE 2002] COMPUTERESULT Spectra-based techniques Nearest Neighbor [Renieris and Reiss, ASE 2003] FAILPASS SUSPICIOUS

118 Future Directions  Enhancements to Value Replacement Improve scalability Study when IVMPs cannot be found at faulty statement  Enhancements to Execution Suppression Improve scalability of variable re-ordering Other techniques to expose crashes Handle memory errors that do not involve corruption  Applications to Fixing Errors IVMPs can be used in BugFix [Jeffrey et. al., ICPC 2009] Comparatively little research in automated techniques for fixing errors  Applications to Tolerating Errors Suppression can be used to recover from failures in server programs [Nagarajan et. al., ISMM 2009] Other applications?

Dissertations Dennis Jeffrey, Google  Dynamic State Alteration Techniques for Automatically Locating software Errors, 2009. Vijay Nagarajan, University of Edinburgh  IMPRESS: improving Multicore Performance and Reliability via Efficient Support for Software Monitoring, 2007.

Dissertations Chen Tian, Samsung R&D Center  Speculative parallelization on Multicore Processors, 2010. Min Feng  The SpiceC Parallel Programming System, 2012 (expected).

Ongoing Work Yan Wang  Qzdb: The QuickZoom Debugger Li Tan  Debugging SpiceC Programs

Ongoing Work Kishore Kumar Pusukuri  OS Architecture interaction on Multicore Processors. Sai Charan Koduru  Resource Allocation issues in Multicore Systems. Changhui Lin  Memory Consistency Models for Multicore Systems.

Scalable Dynamic Analysis for Automated Fault Location and Avoidance Rajiv Gupta Funded by NSF grants from CPA, CSR, & CRI programs and grants from Microsoft.

Similar presentations

Presentation on theme: "Scalable Dynamic Analysis for Automated Fault Location and Avoidance Rajiv Gupta Funded by NSF grants from CPA, CSR, & CRI programs and grants from Microsoft."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scalable Dynamic Analysis for Automated Fault Location and Avoidance Rajiv Gupta Funded by NSF grants from CPA, CSR, & CRI programs and grants from Microsoft.

Similar presentations

Presentation on theme: "Scalable Dynamic Analysis for Automated Fault Location and Avoidance Rajiv Gupta Funded by NSF grants from CPA, CSR, & CRI programs and grants from Microsoft."— Presentation transcript:

Similar presentations

About project

Feedback