Presentation is loading. Please wait.

Presentation is loading. Please wait.

ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

Similar presentations


Presentation on theme: "ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1."— Presentation transcript:

1 ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1

2 Motivation(s)  Where do you see PA in your everyday life?  How does PA “work”?  What is PA anyway? 2

3 Auto-completion 3

4 Pre-compilation error detection  Ex: missing parenthesis 4

5 How do you know... int a; increment_a() { a ++; } while(true) { String a = “hello”; increment_a(); } This “a” is not that “a” 5

6 How do you remember... int a; increment_a() { a ++; } while(true) { String a = “hello”; increment_a(); } Wait, what’s the type of “a” again? “a” is of type int (FYI...) 6

7 Outline  Introduction/motivations  Program representation  AST  3-address code  Control flow analysis  Data flow 7

8 Intermediate Representation (IR)  Initial Point  Abstract Syntax Tree  Abstract vs Concrete Syntax  Parse Tree vs Abstract Syntax Tree  Three-address Codes 8

9 IR-1 Starting Point Parsing, Lexical Analysis Code Generation, Optimization Code Execution Source code Intermediate representation Target code Analyze IR – Perform analysis on the results Use this information for applications 9

10 IR-2. Abstract Syntax Tree (AST)  Concrete vs Abstract Syntax  Concrete show structure and is language-specific  Abstract shows structure  Representations  Parse Tree represents Concrete Syntax  Abstract Syntax Tree represents Abstract Syntax 10

11 IR-2. Example : Grammar  Example  a:= b+c (Language 1)  a = b+c; (Language 2)  Grammar for 1 Ÿstmtlist  stmt | stmt stmtlist stmt  assign | if-then | … assign  ident “:=“ ident binop ident binop  “+” | “-” | …  Grammar for 2 Ÿstmtlist  stmt “;”| stmt “;” stmtlist stmt  assign | if-then | … assign  ident “=“ ident binop ident binop  “+” | “-” | … 11

12 IR-2. Example: Parse Tree stmtlist stmt assign Ident := ident binop ident a b “+” c Parse Tree for a:=b+cParse Tree for a=b+c; stmtlist stmt “;” assign Ident = ident binop ident a b “+” c 12

13 IR-2 Example: Abstract Syntax Tree Example 1. a:=b+c 2. a=b+c;  Abstract Syntax Tree for 1 and 2 assign a add b c 13

14 IR-3. Three Address Code  General form: x = y op z  More generally: (operator, operand1, operand2, result)  (at most 3 spots besides the operator)  May include temporary variables  Examples  Assignment Binary x:= y op z (op, y, z, x) Unary x := op y (op, v, _, x)  Copy x:=y (_, y, _, x)  Jumps Unconditional goto L (goto, L, _, _) Conditional if x relop y goto L (relop, x, y, L)  …. 14

15 IR-3. Example: Three Address Code if a>10 then x=y+z else x=y-z  1. if a>10 goto 4  2. x = y-z  3. goto 5  4. x = y + z  5. ….. 15

16 Analysis Levels  Local  within a single basic block or statement  Intraprocedural  within a single procedure, function, or method  Interprocedural  across procedure boundaries, procedure call, shared globals, etc  Intraclass  within a single class  Interclass  across class boundaries  ….. 16

17 Outline  Introduction/motivations  Program representation  Control flow analysis  Computing Control Flow (analysis and representation)  Search and Traversals  Applications  Data flow 17

18 Computing Control flow (example) Procedure AVG S1count=0; S2 fread(fptr, n) S3 while(not EOF) do S4 if(n<0) S5 return(error) else S6 nums[count]=n S7 count++ endif S8 fread(fptr, n); endwhile S9 avg= mean(nums, count) S10 return (avg) S1 S2 S3 S4 S5 S10 S6 S9 S8 S7 EXIT entry 18

19 CF1: Control Flow (Basic Blocks)  A basic block is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt of possibility of branch except at the end  A basic block may or may not be maximal  For compiler optimizations, maximal blocks are desirable  For software engineering tasks, basic blocks that represent one source code statement are often used 19

20 Computing Control flow (example) Procedure AVG S1count=0; S2 fread(fptr, n) S3 while(not EOF) do S4 if(n<0) S5 return(error) else S6 nums[count]=n S7 count++ endif S8 fread(fptr, n); endwhile S9 avg= mean(nums, count) S10 return (avg) S1 S2 S3 S4 S5 S10 S6 S9 S8 S7 EXIT entry 20

21 CF1: Computing Control Flow  Input: A list of program statements in some form  Output: A list of CFG nodes and edges  Procedure:  Construct basic blocks  Create entry exit nodes; create edge (entry, B1); create (exit, Bk) for each Bk that represents an exit from program  Add CFG edge from Bi to Bj if Bj can immediately follow Bi in some execution i.e., There is conditional or unconditional goto from last statement of Bi to first statement of Bj or Bj immediately follows Bi in the order of the program and Bi does not end in unconditional goto statement  Label edges that represent conditional transfers of control 21

22 CF2: Search and Ordering  Many ways to visit the nodes in the graph  Depth First Search: Visits descendants of the node before visiting any of its siblings  Breadth First Search: All of the node’s immediate descendants are processed before any of their unprocessed children  Preorder Traversal: A node is processed before its descendants  Postorder Traversal: A node is processed after its descendants 22

23 CF2: Search and Ordering (cont’d) (DFS)  One DFS of CFG 1  3  4  6  7  8  10,back to 8,  9, back to 8, 7,6,4,  5, back to 4,3,1,  2,back to 1  The number assigned to a node during DFS is its depth first number  Depth first ordering of nodes is the reverse of the order in which nodes are visited in DFS  For the DFS, nodes are visited 1,3,4,6,7,8,10,8,9,8,7,6,5,4,3,1,2,1  Depth first ordering is 1,2,3,4,5,6,7,8,9, S3 S4 S5 S10 S6 S9 S8 S7

24 CF: Types of Edges  Depth first representation is depth first spanning tree along with other edges not part of the tree; tree edges, other edges  Three kinds of edges  Advanced (forward) edges: go from a node to one of its proper descendants in the tree; these include tree edges  Back edges: go from a node to one of its ancestor in the tree  Cross edges: connect nodes such that neither is an ancestor of the other 24

25 Applications of Control Flow  Complexity – Pointers to refactoring  Testing  Branch, Path, Basis Path  Branch: Must test 1-2, 1-3, 4-5, 4-8, 5-6, 5-7  Path: Infinite, due to loop  Basis Path: Set of paths which covers all the edges at least once e.g. 1,2,4,8; 1,3,4,5,6,7,4,8  Program Understanding  Recover program structure  Impact analysis  …

26 Outline  Introduction/motivations  Program representation  Control flow  Data flow  Introduction  Reaching definitions 26

27 Data flow - Introduction  Flow of various data throughout the program  Obtained from AST or CFG  Used in software engineering tasks  Exact solutions to most data flow problems are undecidable  May depend on input  May depend on the outcome of a conditional statement  May depend on termination of loop  Thus we compute approximations of the exact solution 27

28 Data flow - Introduction  Some Approximations “overestimate” the solution  Approximations contain actual information plus some spurious information but does not omit any actual information  Conservative and safe approach  Some Approximations “underestimate” the solution  Approximations may not contain all the information of the actual solution  Unsafe  Research challenge: Providing safe but precise information in an efficient way  Uses of data flow:  Compiler optimization requires conservative analysis  Software engineering tasks may only need unsafe info 28

29 Data flow – Compiler Optimization  Common subexpression elimination c=a+b =a e=a+b =a d=a+b =a 29

30 Data flow – Compiler Optimization  Common subexpression elimination Need to know available expressions: which expressions have been computed at that point before this statement c=a+b =a e=a+b =a d=a+b =a t=a+b c=t c=a t=a+b d=t c=a e=t =a 30

31 Data Flow - Compiler Optimization  Register (de)allocation  When assigning memory locations to registers, if a value in a register (ie a memory location) is not used again, no need to keep it in a register   Is R2 needed after this statement?  Need to know “live variables”: which variables are still used after current line R1=R2+10 =a 31

32 Data Flow - Compiler Optimization  Suppose every assignment that reaches this statement assigns 5 to c  then ‘a’ can be replaced by 15  But: Need to know reaching definitions: which definition(s) of variable c reach this statement a=c+10 // need 3 registers =a 32 a=15 //need 2 registers /a

33 Data Flow - Sw Eng Tasks  Data-Flow testing  Suppose that a statement assigns a value but the use of that value is never executed under test a never used on this path  Need to know definition use pairs: link between definition(s) and use(s) of a variable (or a memory location) a=c+10 =a d=a+y =a 33

34 Data Flow - Sw Eng Tasks  Debugging  Suppose that ‘a’ has an incorrect value in the statement Eg int overflow  Need data dependence information: some statements produce erroneous values, others are affected by those values a=c+y =a d=a+y =a 34

35 Data flow - Example  Compute the flow of data throughout the program  Where does the assignment to i in statement 1 reach?  Where does the expression computed in statement 2 reach?  Which uses of variable are reachable from the end of Block1?  Is the value of variable i live after statement 2? 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 35 B1 B2 B3 B4

36 Reaching definitions analysis  Definition = statement where a variable is assigned a value (e.g. input statement, assignment statement)  A definition of ‘a’ reaches a point ‘p’ if there exists a control flow path in the CFG from the definition to ‘p’ with no other definitions of ‘a’ on the path  Such a path may exist in the graph but may not be possible – infeasible path 36 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4

37 Reaching definitions analysis  What are the definitions in the program?  Of variable i:  Of variable k:  Which basic blocks (before block) do these definitions reach?  Def 1 reaches:  Def 2 reaches:  Def 3 reaches:  Def 4 reaches:  Def 5 reaches: 37 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4

38 Reaching definitions analysis  What are the definitions in the program?  Of variable i: 1,3  Of variable k: 2,4,5  Which basic blocks (before block) do these definitions reach?  Def 1 reaches: B2  Def 2 reaches: B1, B2, B3  Def 3 reaches: B1, B3, B4  Def 4 reaches: B4  Def 5 reaches: exit 38 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4

39 Reaching definitions analysis  Method  Compute two kinds of basic information (within the block) Gen[B]: set of definitions generated within B Kill[B]: set of definitions that, if they reach the point before B, won’t reach end of B  Compute two other sets by propagation IN[B]: set of definitions the reach the beginning of B OUT[B]: set of definitions that reach the end of B 39 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4

40 Reaching definitions analysis Init GEN Init KILL Init IN Init OUT INOUT 11,23,4,5--1,22,31, ,22,3 342,5--42,33,4 452,4--53,43, i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4

41 Iterative Data-Flow analysis algorithm  Algorithm for Reaching Definitions  Input: CFG with GEN[B], KILL[B] for all B  Output: IN[B], OUT[B] for all B Begin RD IN[B]=empty, OUT[B]=GEN[B] for all B; change = true While change do begin change=false For each B do begin IN[B]=union OUT[P] (P is a predecessor of B) OLDOUT=OUT[B] OUT[B]=GEN[B] union (IN[B]-KILL[B]) if (OUT[B]!=OLDOUT) then change = true; End for End while End RD 41

42 Tools 42  Eclipse JDT/AST (APIs to construct, traverse and manipulate AST)  Sourcerer  Crystal (Data Analysis Framework, mostly for academic purposes)

43 Mandatory Reading List 43  Representation and Analysis of Software – Rep- Analysis.pdf  Crystal Notes – CrystalTutorialNotes.pdf, CrystalTutorial.ppt  Eclipse JDT - AST -

44 More (optional) Reading List 44  Principles of Program Analysis, Nielson and Hankin  Invariant Detection using Daikon – daikon.pdf  More optional readings available at Program Analysis course material at CMU


Download ppt "ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1."

Similar presentations


Ads by Google