Presentation is loading. Please wait.

Presentation is loading. Please wait.

Execution Indexing Xiangyu Zhang.

Similar presentations


Presentation on theme: "Execution Indexing Xiangyu Zhang."— Presentation transcript:

1 Execution Indexing Xiangyu Zhang

2 What is Execution Indexing (EI)
A technique that aligns two executions so that equivalent points in two executions can be identified. Multiple scenarios: Run a program twice on one input (with perturbations); Run two versions of a program on one input; Run a program twice with different inputs;

3 Why EI – Case I Running the same program with the same input
-- Setting breakpoints in debugging void foo() { x = ..; } y = ..; if(pred(y)) foo(); .. 1. break at the 1st instance 2. restart, “set y=<val>” Assume in the first execution, the fall-through edge is taken so that the foo() inside the predicate is not called. If a break point is set at the first instance inside the invocation of foo(). In the second execution, since the value of variable y is perturbed, the predicate outcome is switched so that the foo() inside the predicate is called. The result is that now the breakpoint stops the program execution at the foo() invocation inside the predicate while ideally it should stop at the second call to foo() in the perturbed execution. 3. continue running. 4. stopped, but not at the desired point.

4 Setting breakpoints for multiple threads.
Assume there are 1 producer thread and 2 consumer threads, T1 and T2, and 5 tasks – s1, s2, s3, s4, and s5. producer thread { while (…) { s=generate_task ( …); q.enqueue (s); } consumer thread { while (s=dequeue( )) { work_on_task(s); In one execution: T1: s1 T2: s2, s3, s4, s5 Heisenbugs refer to bugs that do not manifest themselves when the faulty program is run again (within a debugger) with the same input. In another execution: T1: s2, s3 T2: s1, s4, s5

5 Why EI – Case II & III Running two program versions with the same input Program de-obfuscation Debugging compiler Malware detection Debugging regression errors. Running a program twice with different inputs Comparison based debugging Code obfuscation is a technique to increase the difficulty of understanding a program. It is often used to protect the intellectual property of a software or hide malicious logic. s1; If (p) s2; s3; Is transformed to i=0; L: switch (i) { case 0: i=3; goto L; case 1: i=2; case 2: goto X; case 3: i=p?1:2; } X:

6 Using Si Identify an execution point using a pair E:
<statement, instance> E: E’: (with P in E switched) F ( ) { 1: X = … 2: if (P) { 3: *p = *p … 4: F( ) } 5: …= X 1: X = … 2: if (P) then 5: …= X 1: X = … 2: if (P) then 3: *p=*p… 4: F( ) 1: X=… 2 if (P) then 5: …=X 5: …= X A naïve approach that aligns the ith instance of a statement s in one run to the ith instance of the same statement in another run often fails to work. This is equivalent to saying that setting a breakpoint at the ith instance of a statement s does not work.

7 Using Calling Context and Instance
Lets look at a more sophisticated proposal Identify an execution point using a triple <calling context, statement, instance> In one execution, the input file is: ab\n 1 1. 2. 3. 4. 5. 6. 7. 8 9. while (…) { fgets (buf, 256, h); for (i;i<strlen(buf);i++){ F(buf[i]); } F (char c) { …=c; In the other execution, the input file is: a\n 1 A point <CC, s, i> means the ith instance of statement s WITH CONTEXT CC. In other words, it might be the jth instance of statement s with (j>i). Lets look at <[(main, 4)], 8, 2>

8 Basic Idea Align executions top-down, region by region
Region: executed statements between a predicate instance and its immediate post-dominator or a function entry and the corresponding exit form a region A statement instance xi DCD on the predicate instance leading xi ‘s enclosing region. Regions are either nested or disjoint, never overlap.

9 Basic Idea Align executions top-down, region by region E: E’ F ( ) {
At the highest level E: E’ F ( ) { 1: X = … 2: if (P) { 3: *p = *p … 4: F( ) } 5: …= X 1: X = … 2: if (P) then 5: …= X 1: X = … 2: if (P) then 3: *p=*p… 4: F( ) 1: X=… 2 if (P) then 5: …=X 5: …= X

10 Basic Idea Align executions top-down, region by region E’ E: 1: X = …
One level down E’ E: 1: X = … F ( ) { 1: X = … 2: if (P) { 3: *p = *p … 4: F( ) } 5: …= X 2: if (P) then 1: X = … 3: *p=*p… 4: F( ) 1: X=… 2 if (P) then 5: …=X 2: if (P) then 5: …= X 5: …= X

11 while(…) while(…) 1. 2. 3. 4. 5. 6. 7. 8 9. while (…) {
Input: ab\n 1 Input: a\n 1 1. 2. 3. 4. 5. 6. 7. 8 9. while (…) { fgets (buf, ...); for (…strlen(buf); F(buf[i]); } F (char c) { …=c; while(…) fgets(buf); for (i…) F(buf[i]); …=c; F(buf[i]) while (…) fgets (buf) F (buf[i]); while(…) fgets(buf); for (i…) F(buf[i]); …=c; while (…) fgets (buf) F (buf[i]);

12 while(…) while(…) while(…) fgets(buf); for (i…) fgets(buf); F(buf[i]);
…=c; F(buf[i]) while (…) fgets (buf) F (buf[i]); while(…) fgets(buf); for (i…) F(buf[i]); …=c; while (…) fgets (buf) F (buf[i]);

13 while(…) while(…) fgets(buf); fgets(buf); for (i…) for (i…) F(buf[i]);
…=c; for (i…) F(buf[i]) F(buf[i]); …=c; for (i…) while (…) fgets (buf) for (i…) F (buf[i]); …=c; while (…) fgets (buf) for (i…) F (buf[i]); …=c;

14 while(…) while(…) fgets(buf); fgets(buf); for (i…) for (i…) F(buf[i]);
…=c; …=c; for (i…) for (i…) F(buf[i]) …=c; for (i…) while (…) fgets (buf) for (i…) F (buf[i]); …=c; while (…) fgets (buf) for (i…) F (buf[i]); …=c;

15 while(…) while(…) fgets(buf); fgets(buf); for (i…) for (i…) F(buf[i]);
…=c; F(buf[i]); …=c; for (i…) F(buf[i]) …=c; for (i…) while (…) fgets (buf) for (i…) F (buf[i]); …=c; while (…) fgets (buf) for (i…) F (buf[i]); …=c;

16 Formal Definition Execution description language (EDL).
Context free grammar; Constructed automatically from the program; Describes all possible executions; An execution is a string accepted by the grammar. Describes region nestings. Why context free grammar, why not regular expression? We know that a regular expression is equivalent to a finite state machine, which is not capable of counting. a^n b^n is not parsable by a regular language.

17 Execution Description Language
Execution Indexing 2019/1/14 Execution Description Language Program Executions EDL 1: while(..) 2: s1; 3: s2; L →1R13 R1 →21R1 | ε 13 1213 121213 1: if(..) 2: s1; 3: else 4: s2; L →1R1 R1 →2 | 4 12 14 1: s1; 2: s2; 3: s3; L →123 123 Grammar construction: A left hand side (non-terminal) symbol is generated for each predicate and function, representing a region; The statements that are control dependent on the predicate (function entry) and their subregions constitute the right hand side. The first rule describes the top-level control structures; The second rule describes the lower level control structure, in this case, either statement 2 or statement 4 is executed. Note that EDL is different from the grammar of a programming language. In EDL, nonterminals are regions; terminals are program statements. with Notes 17

18 Example Revisit RF R2 F ( ) { 1: X = … 2: if (P) { 3: *p = *p …
} 5: …= X E: RF R2 RF →12R25 R2 →34RF | ε E’:

19 Execution Index Given an execution, the index of an execution point is the path in the EDL derivation tree that leads from the root to the execution point. Reflect region nestings. Two points in two respective executions align if they have the same index. RF R2 RF R2 That implies two points align if their nesting regions align (identical). 51 in E has the index of (RF, 5) 51 in E’ has the index of (RF, R2, RF, 5) Thus, they don’t align. 52 in E’ has (RF, 5) so that it aligns with 51 in E. E: E’:

20 Basic Algorithm Compute and maintain indices online
Execution Indexing 2019/1/14 Basic Algorithm Compute and maintain indices online The derivation tree is not explicitly built; only the current index, which is the current region nesting is maintained. Use a stack to maintain the index. Essentially, it is very similar to control dependence stack (CDS) Instrument at branches and their post-dominators, function calls and returns; The entire stack stands for the index for the current execution point. June 2008 20 with Notes 20

21 Semantic Augmentation
Execution Indexing 2019/1/14 Semantic Augmentation Structural indexing only encodes control structures; Not sufficient for some cases. Event dispatch loop implementation; Characterized by a switch inside a while. Provide a way to encode data in indices. Based on programmer annotation. June 2008 21 with Notes 21

22 Example R →12R2 R2 →345 R5 2 R2 | ε R5 →7 RF | 9 RG R2 1: ...
Execution Indexing 2019/1/14 Example R2 1: ... 2: while (..) { 3: c=getc(); 4: … 5: switch (c) { 6: case ’a’: 7: F(c); break; 8: case ’b’: 9: G(c); break; 10: } 11: } 12: R2 Input: a b R5 R5 RF RG 3 5 9 R2 “This is a program similar to an event dispatch loop. Every time a data is processed, we restart the index with symbol created for that data value.” The semantics of identify for a program point is determined more by data than by structure, to compensate for this, we temporarily restart the index with the data value. Input: b a R2 R →12R2 R2 →345 R5 2 R2 | ε R5 →7 RF | 9 RG R5 R5 RG RF 3 5 7 with Notes 22

23 Example R →R4a | R4b R4a →5 R5 2 R2 R2 →3 | ε R5 →7 RF | 8 RG R4a R4b
Execution Indexing 2019/1/14 Example R2 1: ... 2: while (..) { 3: c=getc(); 4: IDX_DATA(c); 5: switch (c) { 6: case ’a’: 7: F(c); break; 8: case ’b’: 9: G(c); break; 10: } 11: } 12: R2 R5 R5 Input: a b RF RG 3 5 9 R4a R4b The programmer annotate the program at semantic indexing points, at which root nodes are created depending on the VALUES of the variable and new rules are generated following the previous grammar construction rules. Now an index tree become a index forest with multiple trees. R →R4a | R4b R4a →5 R5 2 R2 R2 →3 | ε R5 →7 RF | 8 RG R5 R5 R2 R2 RF RF with Notes 23

24 Applications Running the same program with the same input
-- Setting breakpoints in debugging 1. void foo() { x = ..; 3. } 4. 5. y = ..; 6. … 7. if(pred(y)) foo(); 10. foo(); 1. break at <Rmain, Rfoo, 1> 2. restart, “set y=<val>” 3. continue running.

25 Reading Assignment Efficient Program Execution Indexing, PLDI 2008.

26 Challenge Three (2 extra credits)
Heisenbugs refer to those bugs that change or disappear once a debugger is used. They are notorious for being very difficult to debug. Classic solution is to use expensive tracing, which is prohibitively expensive for product runs. With the primitives of indexing and slicing, please sketch a plan to reproduce heisenbugs. Assume heisenbugs are caused by thread concurrency. You can assume indexing is supported in product runs so that when a failure occurs, the index of the failure can be reported. The failure may not manifest itself in a simple re-execution. You may have to insert perturbations (e.g. synchronizations) to alter the timing of the original execution. The challenge is how and where to perturb. A simple motivating example is expected Limit to 2 pages.


Download ppt "Execution Indexing Xiangyu Zhang."

Similar presentations


Ads by Google