Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lexical Analysis Uses formalism of Regular Languages – Regular Expressions – Deterministic Finite Automata (DFA) – Non-deterministic Finite Automata (NDFA)

Similar presentations


Presentation on theme: "Lexical Analysis Uses formalism of Regular Languages – Regular Expressions – Deterministic Finite Automata (DFA) – Non-deterministic Finite Automata (NDFA)"— Presentation transcript:

1 Lexical Analysis Uses formalism of Regular Languages – Regular Expressions – Deterministic Finite Automata (DFA) – Non-deterministic Finite Automata (NDFA) RE  NDFA  DFA  minimal DFA (F)Lex uses RE as input, builds lexor 1

2 Regular Expressions 2 Regular expression (over  )   awhere a  r+r’ r r’ r* where r,r’ regular (over  ) Notational shorthand: r 0 = , r i = rr i-1 r + = rr *

3 DFAs: Formal Definition DFA M = (Q, , , q 0, F) Q= states finite set  = alphabet finite set  = transition function function in Q    Q q 0 = initial/starting state q 0  Q F= final states F  Q 3

4 DFAs: Example strings over {a,b} with next-to-last symbol = a 4 …aa…ab a …ba…bb  a b b a b b b a a a a a b b b

5 Nondeterministic Finite Automata “Nondeterminism” implies having a choice. Multiple possible transitions from a state on a symbol.  (q,a) is a set of states  : Q    Pow(Q) Can be empty, so no need for error/nonsense state. Acceptance: exist path to a final state? I.e., try all choices. Also allow transitions on no input:  : Q  (   {  })  Pow(Q) 5

6 NFAs: Example strings over {a,b} with next-to-last symbol = a 6 Loop until we “guess” which is the next-to-last a. a   …a  …a …

7 CFGs: Formal Definition G = (V, , P, S) V= variables, a finite set  = alphabet or terminalsa finite set P= productions, a finite set S= start variable, S  V Productions’ form, where A  V,  (V  ) * : A   7

8 CFGs: Derivations Derivations in one step:  A   G   A   P  x  *, , ,  ( V  ) * Can choose any variable for use for derivation step. Derivations in zero-or-more steps:  G * is the reflexive and transitive closure of  G. Language of a grammar: L(G) = {x  * | S  G * x} 8

9 Parse Trees 9 Sample derivations: S  AB  AAB  aAB  aaB  aabB  aabb S  AB  AbB  Abb  AAbb  Aabb  aabb S  A | A B A   | a | A b | A A B  b | b c | B c | b B These two derivations use same productions, but in different orders. Root label = start node. Each interior label = variable. Each parent/child relation = derivation step. Each leaf label = terminal or . All leaf labels together = derived string = yield. S AB AABb aab

10 Left- & Rightmost Derivations 10 Sample derivations: S  AB  AAB  aAB  aaB  aabB  aabb S  AB  AbB  Abb  AAbb  Aabb  aabb S  A | A B A   | a | A b | A A B  b | b c | B c | b B S AB AABb aab These two derivations are special. 1 st derivation is leftmost. Always picks leftmost variable. 2 nd derivation is rightmost. Always picks rightmost variable.

11 Disambiguation Example Exp  n | Exp + Exp | Exp  Exp What is an equivalent unambiguous grammar? Exp  Term | Term + Exp Term  n | n  Term Uses – operator precedence – left-associativity 11

12 Parsing Designations – Major parsing algorithm classes are LL and LR The first letter indicates what order the input is read – L means left to right Second letter is direction in the “parsing tree” the derivation goes, L = top down, R = bottom up – K of LL(k) or LR(k) is number of symbols lookahead in input during parsing – Power of parsing techniques LL(k) < LR(k) LL(n) < LL(n+1), LR(n) < LR(n+1) – Choice of LL or LR largely religious 12

13 Items and Itemsets An itemset is merely a set of items In LR parsing terminology an item – Looks like a production with a ‘.’ in it – The ‘.’ indicates how far the parse has gone in recognizing a string that matches this production – e.g. A -> aAb.BcC suggests that we’ve “seen” input that could replace aAb. If, by following the rules we get A -> aAbBcC. we can reduce by A -> aAbBcC 13

14 Building LR(0) Itemsets Start with an augmented grammar; if S is the grammar start symbol add S’ -> S The first set of items includes the closure of S’ -> S Itemset construction requires two functions – Closure – Goto 14

15 Closure of LR(0) Itemset If J is a set of items for Grammar G, then closure(J) is the set of items constructed from G by two rules 1) Each item in J is added to closure(J) 2) If A  α.Bβ is in closure(J) and B  φ is a production, add B .φ to closure(J) 15

16 Closure Example 16 Grammar: A  aBC A  aA B  bB B  bC C  cC C  λ J A  a.BC A  a.A Closure(J) A  a.BC A-> a.A A .aBC A .aA B .bB B .bC

17 GoTo Goto(J,X) where J is a set of items and X is a grammar symbol – either terminal or non- terminal is defined to be closure of A  αX.β for A  α.Xβ in J So, in English, Goto(J,X) is the closure of all items in J which have a ‘.’ immediately preceding X 17

18 Set of Items Construction Procedure items(G’) Begin C = {closure({[S’ .S]})} repeat for each set of items J in C and each grammar symbol X such that GoTo(J,X) is not empty and not in C do add GoTo(J,X) to C until no more sets of items can be added to C 18

19 Build LR(0) Itemsets for: {S  (S), S  λ} {S  (S), S  SS, S  λ} 19

20 Building LR(0) Table from Itemsets One row for each Itemset One column for each terminal or non-terminal symbol, and one for $ Table [J][X] is: – Rn if J includes A  rhs., A  rhs is rule number n, and X is a terminal – Sn if Goto(J,X) is itemset n 20

21 LR(0) Parse Table for: {S  (S), S  λ } {S  (S), S  SS, S  λ } 21

22 Building SLR Table from Itemsets One row for each Itemset One column for each terminal or non-terminal symbol, and one for $ Table [J][X] is: – Rn if J includes A  rhs., A  rhs is rule number n, X is a terminal, AND X is in Follow(A) – Sn if Goto(J,X) is itemset n 22

23 LR(0) and LR(1) Items LR(0) item “is” a production with a ‘.’ in it. LR(1) item has a “kernel” that looks like LR(0), but also has a “lookahead” – e.g. A  α.X β, {terminals} A  α.X β, a/b/c ≠ A  α.X β, a/b/d 23

24 Closure of LR(1) Itemset If J is a set of LR(1) items for Grammar G, then closure(J) includes 1) Each LR(1) item in J 2) If A  α.B β, a in closure(J) and B  φ is a production, add B . φ, First( β,a) to closure(J) 24

25 LR(1) Itemset Construction Procedure items(G’) Begin C = {closure({[S’ .S, $]})} repeat for each set of items J in C and each grammar symbol X such that GoTo(J,X) is not empty and not in C do add GoTo(J,X) to C until no more sets of items can be added to C 25

26 Build LR(1) Itemsets for: {S  (S), S  SS, S  λ } 26

27 {S  CC, C  cC, C  d} Is this grammar LR(0)? SLR? LR(1)? How can we tell? 27

28 LR(1) Table from LR(1) Itemsets One row for each Itemset One column for each terminal or non-terminal symbol, and one for $ Table [J][X] is: – Rn if J includes A  rhs., a; A  rhs is rule number n; X = a – Sn if Goto(J,X) in LR(1) itemset n 28

29 LALR(1) Parsing LookAhead LR (1) Start with LR(1) items LALR(1) items --- combine LR(1) items with same kernel, different lookahead sets Build table just as LR(1) table but use LALR(1) items Same number of states (row) as LR(0) 29

30 Code Generation Pick three registers to be used throughout Assuming stmt of form dest = s1 op s2 Generate code by: – Load source 1 into r5 – Load source 2 into r6 – R7 = r5 op r6 – Store r7 into destination 30

31 Three-Address Code section 6.2.1 (new), pp 467 (old) Assembler for generic computer Types of statements 3-address (Dragon) – Assignment statement x = y op z – Unconditional jump br label – Conditional jump if( cond ) goto label – Parameter x – Call statement call f 31

32 Example “Source” a = ((c-1) * b) + (-c * b) 32

33 Example 3-Address t1 = c - 1 t2 = b * t1 t3 = -c t4 = t3 * b t5 = t2 + t4 a = t5 33

34 Three-Address Implementation (Quadruples, sec 6.2.2; pp 470-2) oparg1arg2result 0-c1t1 1*bt1t2 2uminusct3 3*bt3t4 4+t2t4t5 5=t5a 34

35 Three-Address Implementation (Triples, section 6.2.3) oparg1arg2 0-c1 1*b(0) 2uminusc 3*b(2) 4+(1)(3) 5=a(4) 35

36 Three-Address Implementation N-tuples (my choice – and yours ??) – Lhs = oper(op 1, op 2, …, op n ) – Lhs = call(func, arg 1, arg 2, … arg n ) – If condOper(op 1, op 2, Label) – br Label 36

37 Three-Address Code 3-address operands – Variable – Constant – Array – Pointer 37

38 Variable Storage Memory Locations (Logical) Stack Heap Program Code Register Variable Classes Automatic (locals) Parameters Globals 38

39 Variable Types Scalars Arrays Structs Unions Objects ? 39

40 Row Major Array Storage char A[20][15][10]; 40 1000A[0][0][0]... 1150A[1][0][0] 1160A[1][1][0] 1161A[1][1][1] 3999A[19][14][9]

41 Column Major Array Storage char A[20][15][10]; 41 1000A[0][0][0] 1001A[1][0][0]... 1021A[1][1][0] 1321A[1][1][1] 3999A[19][14][9]

42 OR (Row Major) char A[20][15][10]; 42 3999A[0][0][0]... 3849A[1][0][0] 3839A[1][1][0] 3838A[1][1][1] 1000A[19][14][9]

43 Array Declaration Algorithm Dimension Node { int min; int max; int size; } 43

44 Declaration Algorithm (2) Doubly linked list of dimension nodes Pass 1 – while parsing – Build linked list from left to right – Insert min, max – Size = size of an element (e.g. 4 for int) – Append node to end of list min = max = size = 1 44

45 Declaration Algorithm (3) Pass 2 Traverse list from tail to head For each node, n, going “right” to “left” – Factor = n.max – n.min + 1 – For each node, m, right to left starting with n – m.size = m.size * factor For each node, n, going right to left – max = N->left->max; min = N->left->min Save size of first node as size of entire array Delete first element of list Set tail->size = size of an element (e.g. 4 for int) 45

46 Array Declaration (Row Major) int weight[2000..2005][1..12][1..31]; list of “dimension” nodes int min, max, size size of element of this dimension 46 14481244

47 Array Offset (Row Major) Traverse list summing (max-min) * size int weight[2000..2005][1..12][1..31]; x = weight [2002][5][31] (2002-2000) * 1448 + (5-1) * 124 + (31-1) * 4 47 14481244

48 Array Offset (Row Major) Traverse list summing (max-min) * size int weight[2000..2005][1..12][1..31]; x = weight [i][j][k] (i - 2000) * 1448 + (j-1) * 124 + (k-1) * 4 48 14481244

49 Your Turn Assume – int A[10][20][30]; – Row major order “Show” A’s dimension list Show hypothetical 3-addr code for – X = A[2][3][4] ; – A[3][4][5] = 9 49

50 50 My “Assembly” code X = A[2][3][4]; T1 = 2 * 2400 T2 = 3 * 120 T3 = T1 + T2 T4 = 4 * 4 T5 = T3 + T4 T6 = T5 + 64 # 64 is A’s offset %eax = T5 %eax = %ebp + %eax %eax = 0(%eax) 16(%ebp) = %eax # 16 is X’s offset

51 Your Turn 2 Assume – int A[10][20][30]; – Column major order “Show” A’s dimension list Show hypothetical 3-addr code for – X = A[2][3][4] ; – A[3][4][5] = 9 51

52 Road Map Regular Exprs, Context-Free Grammars LR parsing algorithm Building LR parse tables Compiling Expressions Build control flow intermediates Generate target code Optimize intermediate 52

53 Control Constructs Can be cumbersome, but not difficult “Write” control construct in 3-addr pseudo code using labels and gotos. Map that “control construct” to grammar rule action(s). 53

54 Semantic Hooks Selection_statement : IF ‘(‘ comma_expr ‘)’ stmt | IF ‘(‘ comma_expr ‘)’ stmt ELSE stmt ; 1 shift/reduce error 54

55 Add actions (1) Selection_statement : IF ‘(‘ comma_expr ‘)’ {printf(“start IF \n”);} stmt {printf(“IF Body \n”);} 55

56 Add actions (2) | IF ‘(‘ comma_expr ‘)’ {printf(“start IF \n”);} stmt {printf(“Then Body \n”);} ELSE stmt {printf(“ELSE Body \n”);} ; 31 reduce/reduce errors ! 56

57 Solution (1) Selection_statement : if_start | if_start ELSE stmt {printf(“ELSE body”);} } 57

58 Solution (2) if_start : IF ‘(‘ comma_expr ‘)’ {printf(“start IF \n”);} stmt {printf(“Then Body \n”);} ; 1 shift-reduce 58

59 Control Flow Graph sec 8.4 (new); sec 9.4 (old) Nodes are Basic Blocks – Single entry, single exit – No branch exempt (possibly) at bottom Edges represent one possible flow of execution between two basic blocks Whole CFG represents a function 59

60 Bubble Sort begin; int A[10]; main(){ int i,j; Do 10 i = 0, 9, 1 10 A[i] = random(); Do 20 i = 1, 9, 1 Do 20 j = 1, 9, 1 20 if( A[j] > A[j+1]) swap(j); } 60

61 Bubble Sort (cont.) int swap(int i) { int temp; temp = A[i]; A[i] = A[i+1]; A[i+1] = temp; } end; 61

62 Example Generate 3-addr code for BubbleSort 62

63 Building CFG alg 8.5 (pp 526-7); alg 9.1(p 529) Starting with 3-addr code Each leader starts a basic block which ends immediately before the next leader ID “leaders” (heads of basic blocks) – First statement is a leader – Any statement that is the target of a conditional of unconditional goto is a leader – Any statement immediately following a goto or conditional goto is a leader Each leader starts a basic block which ends immediately before the next leader 63

64 Example Build control flow graphs for BubbleSort 64

65 “Simple Optimizations” Once called “Dragon Book” optimizations Now often called “Machine Independent Optimizations” (chapter 9 of text) – Common subexpression elimination (CSE) – Copy propagation – Dead code elimination – Partial redundancy elimination 65

66 Machine Independent Optimization (cont.) Code motion Induction variable simplification Constant propagation Local vs. Global optimization Interprocedural optimization 66

67 Loop Optimization “Programs spend 90% of time in loops” Loop optimizations well studied – “simple” optimizations – “loop mangeling” 67

68 Loop Invariant Code Motion Identify code that computes same value during each iteration Move loop invariant code to above loop “Standard” optimization in most compilers 68

69 Loop Invariant Example for (i = 0; i < N; i++) for(j=0; j < N; j++) { c[i][j] = 0; for(k=0; k < N; k++) c[i][j] += a[i][k] * b[k][j]; } 69

70 Example (cont.) L1: t1 = i * N t2 = t1 + j t3 = t2 * 4 t4 = &c + t3 t12 = t1 + k t13 = t12 * 4 t14 = &a + t13 t21 = k * N t22 = t21 + j t23 = t22 * 4 t24 = &b + t23 t31 = *t14 * *t24 *t4 = *t4 + t31 k = k + 1 if( k < N) goto L1 70 “Assembler” for Innermost (k) loop

71 Example (cont.) t1 = i * N t2 = t1 + j t3 = t2 * 4 t4 = &c + t3 L1: t12 = t1 + k t13 = t12 * 4 t14 = &a + t13 t21 = k * N t22 = t21 + j t23 = t22 * 4 t24 = &b + t23 t31 = *t14 * *t24 *t4 = *t4 + t31 k = k + 1 if( k < N) goto L1 71

72 Induction Variables Changes by constant amount per iteration Often used in array address computation Simplification of induction variables Strength reduction --- convert * to + 72

73 Example (cont.) t1 = i * N t2 = t1 + j t3 = t2 * 4 t4 = &c + t3 t14 = &a t24 = &b t32 = N * 4 t33 = t32 + &a L1: t31 = *t14 * *t24 *t4 = *t4 + t31 t14 = t14 + 4 t24 = t24 + t32 if(t14 < t33) goto L1 73

74 Loop Transformations More sophisticated Relatively few compilers include them Loop Interchange – for nested loops Unroll and Jam – for nested loops Loop fusion Loop distribution Loop unrolling 74

75 Register Usage Keep as many values in registers as possible Register assignment Register allocation Popular techniques – Local vs. global – Graph coloring – Bin packing 75

76 Local Register Assignment Given – Control-flow graph of basic blocks – List of 3-addr statements per BB – Set of “live” scalar values per stmt – Sets of scalar values used, defined per stmt Design a local register assignment/allocation algorithm 76

77 Graph Coloring Assign a color to each node in graph Two nodes connected by an edge must have different colors Classic problem in graph theory NP complete – But good heuristics exist for register allocation 77

78 Live Ranges 78 def y def x use y def x def y use x def x use x use y

79 Graph Coloring Register Assign Each value is allocated a (symbolic) register “Variables” interfere iff live ranges overlap Two interfering values cannot share register How can we tell if two values interfere? 79 s1s2 s3s4

80 Interference Graph Values and interference – Nodes are the values – Edge between two nodes iff they interfere 80 s1s2 s3s4

81 Graph Coloring Example 81

82 Graph Coloring Example 82 3 Colors

83 Heuristics for Register Coloring Coloring a graph with N colors For each node, m – If degree(m) < N Node can always be colored, because After coloring adjacent nodes, at least one color left for current node – If degree(m) >= N Still may be colorable with N colors 83

84 Heuristics for Register Coloring Remove nodes that have degree < N – Push the removed nodes onto a stack When all the nodes have degree >= N – Find a node to spill (no color for that node) – Remove that node When graph empty, start to color – Pop a node from stack back – Color node different from adjacent (colored) nodes 84

85 Another Coloring Example 85 s1s2 s3s4 s0 N = 3

86 Another Coloring Example 86 s1s2 s3s4 s0 N = 3 s4

87 Another Coloring Example 87 s1s2 s3s4 s0 N = 3 s4

88 Another Coloring Example 88 s1s2 s3s4 s0 N = 3 s4 s3

89 Another Coloring Example 89 s1s2 s3s4 s0 N = 3 s4 s3 s2

90 Another Coloring Example 90 s1s2 s3s4 s0 N = 3 s4 s3 s2

91 Another Coloring Example 91 s1s2 s3s4 s0 N = 3 s4 s3 s2

92 Another Coloring Example 92 s1 s3s4 s0 N = 3 s4 s3 s2

93 Another Coloring Example 93 s1s2 s3s4 s0 N = 3 s4

94 Another Coloring Example 94 s1s2 s3s4 s0 N = 3 s4

95 Another Coloring Example 95 s1s2 s3s4 s0 N = 3

96 Another Coloring Example 96 s1s2 s3s4 s0 N = 3

97 Which value to pick? One with interference degree >= N One with minimal spill cost (cost of placing value in memory rather than in register) What is spill cost? – Cost of extra load and store instructions 97

98 One Way to Compute Spill Cost Goal: give priority to values used in loops So assume loops execute 10 times Spill cost = defCost + useCost defCost = sum over all definitions of cost of a store times 10 nestingDepthOfLoop useCost = sum over all uses of cost of a load times 10 nestingDepthOfLoop Choose the value with the lowest spill cost 98


Download ppt "Lexical Analysis Uses formalism of Regular Languages – Regular Expressions – Deterministic Finite Automata (DFA) – Non-deterministic Finite Automata (NDFA)"

Similar presentations


Ads by Google