Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Similar presentations


Presentation on theme: "Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003."— Presentation transcript:

1 Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003

2 Lexical Analyzer and Parser

3 Parser Accepts string of tokens from lexical analyzer (usually one token at a time) Verifies whether or not string can be generated by grammar Reports syntax errors (recovers if possible)

4 Errors Lexical errors (e.g. misspelled word) Syntax errors (e.g. unbalanced parentheses, missing semicolon) Semantic errors (e.g. type errors) Logical errors (e.g. infinite recursion)

5 Error Handling Report errors clearly and accurately Recover quickly if possible Poor error recover may lead to avalanche of errors

6 Error Recovery Panic mode: discard tokens one at a time until a synchronizing token is found Phrase-level recovery: Perform local correction that allows parsing to continue Error Productions: Augment grammar to handle predicted, common errors Global Production: Use a complex algorithm to compute least-cost sequence of changes leading to parseable code

7 Context Free Grammars CFGs can represent recursive constructs that regular expressions can not A CFG consists of: –Tokens (terminals, symbols) –Nonterminals (syntactic variables denoting sets of strings) –Productions (rules specifying how terminals and nonterminals can combine to form strings) –A start symbol (the set of strings it denotes is the language of the grammar)

8 Derivations (Part 1) One definition of language: the set of strings that have valid parse trees Another definition: the set of strings that can be derived from the start symbol E  E + E | E * E | (E) | – E | id E => -E (read E derives –E ) E => -E => -(E) => -(id)

9 Derivations (Part 2) αAβ => αγβ if A  γ is a production and α and β are arbitrary strings of grammar symbols If a 1 => a 2 => … => a n, we say a 1 derives a n => means derives in one step * => means derives in zero or more steps + => means derives in one or more steps

10 Sentences and Languages Let L(G) be the language generated by the grammar G with start symbol S : –Strings in L(G) may contain only tokens of G –A string w is in L(G) if and only if S + => w –Such a string w is a sentence of G Any language that can be generated by a CFG is said to be a context-free language If two grammars generate the same language, they are said to be equivalent

11 Sentential Forms If S * => α, where α may contain nonterminals, we say that α is a sentential form of G A sentence is a sentential form with no nonterminals

12 Leftmost Derivations Only the leftmost nonterminal in any sentential form is replaced at each step A leftmost step can be written as wAγ lm => wδγ –w consists of only terminals –γ is a string of grammar symbols If α derives β by a leftmost derivation, then we write α lm * => β If S lm * => α then we say that α is a left- sentential form of the grammar Analogous terms exist for rightmost derivations

13 Parse Trees A parse tree can be viewed as a graphical representation of a derivation Every parse tree has a unique leftmost derivation (not true of every sentence) An ambiguous grammars has: –more than one parse tree for at least one sentence –more than one leftmost derivation for at least one sentence

14 Capability of Grammars Can describe most programming language constructs An exception: requiring that variables are declared before they are used –Therefore, grammar accepts superset of actual language –Later phase (semantic analysis) does type checking

15 Regular Expressions vs. CFGs Every construct that can be described by an RE and also be described by a CFG Why use REs at all? –Lexical rules are simpler to describe this way –REs are often easier to read –More efficient lexical analyzers can be constructed

16 Verifying Grammars A proof that a grammar verifies a language has two parts: –Must show that every string generated by the grammar is part of the language –Must show that every string that is part of the language can be generated by the grammar Rarely done for complete programming languages!

17 Eliminating Ambiguity (1) stmt  if expr then stmt | if expr then stmt else stmt | other if E 1 then if E 2 then S 1 else S 2

18 Eliminating Ambiguity (2)

19 Eliminating Ambiguity (3) stmt  matched | unmatched matched  if expr then matched else matched | other unmatched  if expr then stmt | if expr then matched else unmatched

20 Left Recursion A grammar is left recursive if for any nonterminal A such that there exists any derivation A + => Aα for any string α Most top-down parsing methods can not handle left-recursive grammars

21 Eliminating Left Recursion (1) A  Aα 1 | Aα 2 | … | Aα m | β 1 | β 2 | … | β n A  β 1 A’ | β 2 A’ | … | β n A’ A’  α 1 A’ | α 2 A’ | … | α m A’ | ε Harder case: S  Aa | b A  Ac | Sd | ε

22 Eliminating Left Recursion (2) First arrange the nonterminals in some order A 1, A 2, … A n Apply the following algorithm: for i = 1 to n { for j = 1 to i-1 { replace each production of the form A i  A j γ by the productions A i  δ 1 γ | δ 2 γ | … | δ k γ, where A j  δ 1 | δ 2 | … | δ k are the A j productions } eliminate the left recursion among A i productions }

23 Left Factoring Rewriting productions to delay decisions Helpful for predictive parsing Not guaranteed to remove ambiguity A  αβ 1 | αβ 2 A  αA’ A’  β 1 | β 2

24 Limitations of CFGs Can not verify repeated strings –Example: L 1 = {wcw | w is in (a|b)*} –Abstracts checking that variables are declared Can not verify repeated counts –Example: L 2 = {a n b m c n d m | n≥1 & m≥1} –Abstracts checking that number of formal and actual parameters are equal Therefore, some checks put off until semantic analysis

25 Top Down Parsing Can be viewed two ways: –Attempt to find leftmost derivation for input string –Attempt to create parse tree, starting from at root, creating nodes in preorder General form is recursive descent parsing –May require backtracking –Backtracking parsers not used frequently because not needed

26 Predictive Parsing A special case of recursive-descent parsing that does not require backtracking Must always know which production to use based on current input symbol Can often create appropriate grammar: –removing left-recursion –left factoring the resulting grammar

27 Transition Diagrams For parser: –One diagram for each nonterminal –Edge labels can be tokens or nonterminal A transition on a token means we should take that transition if token is next input symbol A transition on a nonterminal can be thought of as a call to a procedure for that nonterminal As opposed to lexical analyzers: –One (or more) diagrams for each token –Labels are symbols of input alphabet

28 Creating Transition Diagrams First eliminate left recursion from grammar Then left factor grammar For each nonterminal A : –Create an initial and final state –For every production A  X 1 X 2 …X n, create a path from initial to final state with edges labeled X 1, X 2, …, X n.

29 Using Transition Diagrams Predictive parsers: –Start at start symbol of grammar –From state s with edge to state t labeled with token a, if next input token is a : State changes to t Input cursor moves one position right –If edge labeled by nonterminal A : State changes to start state for A Input cursor is not moved If final state of A reached, then state changes to t –If edge labeled by ε, state changes to t Can be recursive or non-recursive using stack

30 Transition Diagram Example E  E + T | T T  T * F | F F  (E) | id E  TE’ E’  +TE’ | ε T  FT’ T’  *FT’ | ε F  (E) | id E: F: T’: T: E’:

31 Simplifying Transition Diagrams E’:E:

32 Nonrecursive Predictive Parsing (1) Input Stack

33 Nonrecursive Predictive Parsing (2) Program considers X, the symbol on top of the stack, and a, the next input symbol If X = a = $, parser halts successfully if X = a ≠ $, parser pops X off stack and advances to next input symbol If X is a nonterminal, the program consults M[X, a] (production or error entry)

34 Nonrecursive Predictive Parsing (3) Initialize stack with start symbol of grammar Initialize input pointer to first symbol of input After consulting parsing table: –If entry is production, parser replaces top entry of stack with right side of production (leftmost symbol on top) –Otherwise, an error recovery routine is called

35 Predictive Parsing Table Nonter- minal Input Symbol id+*()$ EE  TE’ E’E’  +TE’E’  ε TT  FT’ T’T’  εT’  *FT’T’  ε FF  idF  (E)

36 Using a Predictive Parsing Table StackInputOutput $Eid+id*id$ $E’Tid+id*id$E  TE’ $E’T’Fid+id*id$T  FT’ $E’T’idid+id*id$F  id $E’T’+id*id$ $E’+id*id$T’  ε $E’T++id*id$E’  +TE’ $E’Tid*id$ $E’T’Fid*id$T  FT’ StackInputOutput ……… $E’T’idid*id$F  id $E’T’*id$ $E’T’F**id$T’  *FT’ $E’T’Fid$ $E’T’idid$F  id $E’T’$ $E’$T’  ε $$E’  ε

37 FIRST FIRST(α) is the set of all terminals that begin any string derived from α Computing FIRST : –If X is a terminal, FIRST(X) = {X} –If X  ε is a production, add ε to FIRST(X) –If X is a nonterminal and X  Y 1 Y 2 …Y n is a production: For all terminals a, add a to FIRST(X) if a is a member of any FIRST(Y i ) and ε is a member of FIRST(Y 1 ), FIRST(Y 2 ), … FIRST(Y i-1 ) If ε is a member of FIRST(Y 1 ), FIRST(Y 2 ), … FIRST(Y n ), add ε to FIRST(X)

38 FOLLOW FOLLOW(A), for any nonterminal A, is the set of terminals a that can appear immediately to the right if A in some sentential form More formally, a is in FOLLOW(A) if and only if there exists a derivation of the form S * =>αAaβ $ is in FOLLOW(A) if and only if there exists a derivation of the form S * => αA

39 Computing FOLLOW Place $ in FOLLOW(S) If there is a production A  αBβ, then everything in FIRST(β) (except for ε ) is in FOLLOW(B) If there is a production A  αB, or a production A  αBβ where FIRST(β) contains ε,then everything in FOLLOW(A) is also in FOLLOW(B)

40 FIRST and FOLLOW Example E  TE’ E’  +TE’ | ε T  FT’ T’  *FT’ | ε F  (E) | id FIRST(E) = FIRST(T) = FIRST(F) = {(, id} FIRST(E’) = {+, ε} FIRST(T’) = {*, ε} FOLLOW(E) = FOLLOW(E’) = {), $} FOLLOW(T) = FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, *, $}

41 Creating a Predictive Parsing Table For each production A  α : –For each terminal a in FIRST(α) add A  α to M[A, a] –If ε is in FIRST(α) add A  α to M[A, b] for every terminal b in FOLLOW(A) –If ε is in FIRST(α) and $ is in FOLLOW(A) add A  α to M[A, $] Mark each undefined entry of M as an error entry (use some recovery strategy)

42 Multiply-Defined Entries Example S  iEtSS’ | a S’  eS | ε E  b Nonter- minal Input Symbol abite$ SS  aS  iEtSS’ S’ S’  ε S’  eS S’  ε EE  b

43 LL(1) Grammars (1) Algorithm covered in class can be applied to any grammar to produce a parsing table If parsing table has no multiply-defined entries, grammar is said to be “LL(1)” –First “L”, left-to-right scanning of input –Second “L”, produces leftmost derivation –“1” refers to the number of lookahead symbols needed to make decisions

44 LL(1) Grammars (2) No ambiguous or left-recursive grammar can be LL(1) Eliminating left recursion and left factoring does not always lead to LL(1) grammar Some grammars can not be transformed into an LL(1) grammar at all Although the example of a non-LL(1) grammar we covered has a fix, there are no universal rules to handle cases like this

45 Shift-Reduce Parsing One simple form of bottom-up parsing is shift- reduce parsing Starts at the bottom (leaves, terminals) and works its way up to the top (root, start symbol) Each step is a “reduction”: –Substring of input matching the right side of a production is “reduced” –Replaced with the nonterminal on the left of the production If all substrings are chosen correctly, a rightmost derivation is traced in reverse

46 Shift-Reduce Parsing Example S  aABe A  Abc | b B -> d abbcde aAbcde aAde aABe S S rm => aABe rm =>aAde rm =>aAbcde rm => abbcde

47 Handles (1) Informally, a “handle” of a string: –Is a substring of the string –Matches the right side of a production –Reduction to left side of production is one step along reverse of rightmost derivation Leftmost substring matching right side of production is not necessarily a handle –Might not be able to reduce resulting string to start symbol –In example from previous slide, if reduce aAbcde to aAAcde, can not reduce this to S

48 Handles (2) Formally, a handle of a right-sentential form γ : –Is a production A  β and a position of γ where β may be found and replaced with A –Replacing A by β leads to the previous right-sentential form in a rightmost derivation of γ So if S rm * => αAw rm => αβw then A  β in the position following α is a handle of αβw The string w to the right of the handle contains only terminals Can be more than one handle if grammar is ambiguous (more than one rightmost derivation)

49 Ambiguity and Handles Example E  E + E E  E * E E  (E) E  id E rm => E + E rm => E + E * E rm => E + E * id 3 rm => E + id 2 * id 3 rm => id 1 + id 2 * id 3 E rm => E * E rm => E * id 3 rm => E + E * id 3 rm => E + id 2 * id 3 rm => id 1 + id 2 * id 3

50 Handle Pruning Repeat the following process, starting from string of tokens until obtain start symbol: –Locate handle in current right-sentential form –Replace handle with left side of appropriate production Two problems that need to be solved: –How to locate handle –How to choose appropriate production

51 Shift-Reduce Parsing Data structures include a stack and an input buffer –Stack holds grammar symbols and starts off empty –Input buffer holds the string w to be parsed Parser shifts input symbols onto stack until a handle β is on top of the stack –Handle is reduced to the left side of appropriate production –If stack contains only start symbol and input is empty, this indicates success

52 Actions of a Shift-Reduce Parser Shift – the next input symbol is shifted onto the top of the stack Reduce – The parser reduces the handle at the top of the stack to a nonterminal (the left side of the appropriate production) Accept – The parser announces success Error – The parser discovers a syntax error and calls a recovery routine

53 Shift Reduce Parsing Example StackInputAction $id 1 + id 2 * id 3 $shift $id 1 + id 2 * id 3 $reduce by E  id $E+ id 2 * id 3 $shift $E +id 2 * id 3 $shift $E + id 2 * id 3 $reduce by E  id $E + E* id 3 $shift $E + E *id 3 $shift $E + E * id3$reduce by E  id $E + E * E$reduce by E  E * E $E + E$reduce by E  E + E $E$accept

54 Viable Prefixes Two definitions of a viable prefix: –A prefix of a right sentential form that can appear on a stack during shift-reduce parsing –A prefix of a right-sentential form that does not continue past the right end of the rightmost handle Can always add tokens to the end of a viable prefix to obtain a right-sentential form

55 Conflicts in Shift-Reduce Parsing There are grammars for which shift-reduce parsing can not be used Shift/reduce conflict: can not decide whether to shift or reduce Reduce/reduce conflict: can not decide which of multiple possible reductions to make Sometimes can add rule to adapt for use with ambiguous grammar

56 Operator-Precedence Parsing A form of shift-reduce parsing that can apply to certain simple grammars –No productions can have right side ε –No right side can have two adjacent nonterminals –Other essential requirements must be met Once the parser is built (often by hand), the grammar can be effectively ignored

57 Precedence Relations RelationMeaning a <· ba "yields precedence to" b a ·= ba "has the same precedence as" b a ·> ba "takes precedence over" b

58 Using Precedence Relations (1) Can be thought of as delimiting handles: –<· Marks left end of handle –·> Appears in the interior of handle –·= Marks right end of handle Consider right-sentential β 0 a 1 β 1 β 1 …a n B n : –Each β i is either single nonterminal or ε –Each a i is a single token –Suppose that exactly one precedence relation will hold for each a i, a i+1 pair

59 Using Precedence Relations (2) Mark beginning and end of string with $ Remove the nonterminals Insert correct precedence relation between each pair of terminals id+*$ ·> +<·<· <·<· *<·<· $<·<·<·<·<·<· id + id * id $ + * $

60 Using Precedence Relations (3) To find the current handle: –Scan the string from the left until the first ·> is encountered –Scan backwards (left) from there until a <· is encountered –Everything in between, including intervening or surrounding nonterminals, is the handle The nonterminals do not influence the parse!

61 Implementing the Algorithm set ip to point to the first symbol in w$ initialize stack to $ repeat forever if $ on top of stack in ip points to $ return success else let a be topmost symbol on stack let b be symbol pointed to by ip if a <· b or a ·= b push b onto stack advance ip to next input symbol else if a ·> b repeat pop x until top symbol on stack <· x else error()

62 Precedence and Associativity (1) For grammars describing arithmetic expressions: –Can construct table of operator-precedence relations automatically –Heuristic based on precedence and associativity of operators Selects proper handles, even if grammar is ambiguous

63 Precedence and Associativity (2) If operator θ 1 has higher precedence than operator θ 2, make θ 1 ·> θ 2 and θ 2 <· θ 1 If θ 1 and θ 2 are of equal precedence: –If they are left associative, make θ 1 ·> θ 2 and θ 2 ·> θ 1 –If they are right associative, make θ 1 <· θ 2 and θ 2 <· θ 1

64 Precedence and Associativity (3) For all operators θ : –θ <· id –id ·> θ –θ <· ( –( <· θ –) ·> θ –θ ·> ) –θ ·> $ –$ <· θ Also let: –(·= ) –( <· ( –( <· id –$ <· ( –id ·> $ –id ·> ) –$ <· id –) ·> $ –) ·> )

65 Operator Grammar Example ^ is of highest precedence and is right- associative * and / are of next highest precedence and are left-associative + and – are of lowest precedence and are left- associative E  E + E | E – E | E * E | E / E | E ^ E | (E) | -E | id

66 Computed Precedence Relations +-*/^id()$ +·> <·<·<·<·<·<·<·<·<·<· - <·<·<·<·<·<·<·<·<·<· * <·<·<·<·<·<· / <·<·<·<·<·<· ^ <·<·<·<·<·<· id·> (<·<·<·<·<·<·<·<·<·<·<·<·<·<··= )·> $<·<·<·<·<·<·<·<·<·<·<·<·<·<·

67 Handling Unary Operators If unary operator θ is not also a binary operator: –Incorporate θ into table –For all θ n : Make θ n < · θ no matter what If θ has higher precedence than θ n, make θ ·> θ n Otherwise, make θ < · θ n Otherwise the easiest thing to do is create second lexical symbol (token) –Lexical analyzer must distinguish one from the other –Can't use lookahead, must rely on previous token

68 Precedence Functions (1) Do not need to store entire table of precedence relations Select two precedence functions f and g : –f(a) < g(b) whenever a <· b –f(a) = g(b) whenever a ·= b –f(a) > g(b) whenever a ·> b +-*/^()id$ f224440660 g113355050

69 Precedence Functions (2) Precedence relation between a and b is determined by comparing f(a) to g(b) Loss of error detection capability (errors caught later when no reduction for handle is found) It is not always possible to construct valid precedence functions When it is possible, functions can be computed automatically

70 Precedence Functions Algorithm Create symbols f a and f g for all tokens and $ If a ·= b then f a and g b must be in same group Partition symbols into as many groups as possible For all cases where a <· b, draw edge from group of g b to group of fa For all cases where a ·> b, draw edge from group of f a to group of g b If graph has cycles, no precedence functions exist Otherwise: –f(a) is the length of the longest path beginning at group of f a –g(a) is the length of the longest path beginning at group of g a

71 Precedence Functions Example id+*$ ·> +<·<· <·<· *<·<· $<·<·<·<·<·<· +*id$ f2440 g1350

72 Detecting and Handling Errors Errors can occur at two points: –If no precedence relation holds between the terminal on top of stack and current input –If a handle has been found, but no production is found with this handle as right side Errors during reductions can be handled with diagnostic message Errors due to lack of precedence relation can be handled by recovery routines specified in table

73 LR Parsers LR Parsers us an efficient, bottom-up parsing technique useful for a large class of CFGs Too difficult to construct by hand, but automatic generators to create them exist (e.g. Yacc) LR(k) grammars –“L” refers to left-to-right scanning of input –“R” refers to rightmost derivation (produced in reverse order) –“k” refers to the number of lookahead symbols needed for decisions (if omitted, assumed to be 1)

74 Benefits of LR Parsing Can be constructed to recognize virtually all programming language construct for which a CFG can be written Most general non-backtracking shift-reduce parsing method known Can be implemented efficiently Handles a class of grammars that is a superset of those handled by predictive parsing Can detect syntactic errors as soon as possible with a left-to-right scan of input

75 Model of LR Parser Stack Input

76 LR Parser (1) Driver program is the same for all LR Parsers Stack consists of states ( s i ) and grammar symbols ( X i ) –Each state summarizes information contained in stack below it –Grammar symbols do not actually need to be stored on stack in most implementations State symbol on top of stack and next input symbol used to determine shift/reduce decision

77 LR Parser (2) Parsing table includes action function and goto function Action function –Based on state and next input symbol –Actions are shift, reduce, accept or error Goto function –Based on state and grammar symbol –Produces next state

78 LR Parser (3) Configuration ( s 0 X 1 s 1 …X m s m,a i a i+1 …a n $ ) indicates right-sentential form X 1 X 2 …X m a i a i+1 …a n If action[s m,a i ] = shift s, enter configuration ( s 0 X 1 s 1 …X m s m a i s, a i+1 …a n $ ) If action[s m,a i ] = reduce A  B, enter configuration ( s 0 X 1 s 1 …X m-r s m-r As, a i+1 …a n $ ), where s = goto[s m-r, A] and r is length of B If action[s m,a i ] = accept, signal success If action[s m,a i ] = error, try error recovery

79 LR Parsing Algorithm set ip to point to the first symbol in w$ initialize stack to s 0 repeat forever let s be topmost state on stack let a be symbol pointed to by ip if action[s,a] = shift s’ push a then s’ onto stack advance ip to next input symbol else if action[s,a] = reduce A  B pop 2*|B| symbols of stack let s’ be state now on top of stack push A then goto[s’,A] onto stack output production A  B else if action[s,a] == accept return success else error()

80 LR Parsing Table Example state actiongoto id+*()$ETF 0s5s4123 1s6acc 2r2s7r2 3r4 4s5s4823 5r6 6s5s493 7s5s410 8s6s11 9r1s7r1 10r3 11r5 (1) E  E + T (2) E  T (3) T  T * F (4) T  F (5) F  (E) (6) F  id

81 LR Parsing Example StackInputAction (1) s0id * id + id $shift (2) s0 id s5* id + id $reduce by F  id (3) s0 F s3* id + id $reduce by T  F (4) s0 T s2* id + id $shift (5) s0 T s2 * s7id + id $shift (6) s0 T s2 * s7 id s5+ id $reduce by F  id (7) s0 T s2 * s7 F s10+ id $reduce by T  T * F (8) S0 T s2+ id $reduce by E  T (9) s0 E s1+ id $shift (10) s0 E s1 + s6id $shift (11) s0 E s1 + s6 id s5$reduce by F  id (12) s0 E s1 + s6 F s3$reduce by T  F (13) s0 E s1 + s6 T s9$reduce by E  E + T (14) s0 E s1$accept

82 Constructing LR Parsing Tables Three methods: –SLR (simple LR) Not all that simple (but simpler than other two)! Weakest of three methods, easiest to implement –Constructing canonical LR parsing tables Most general of methods Constructed tables can be quite large –LALR parsing table (lookahead LR) Tables smaller than canonical LR Most programming language constructs can be handled We will not cover any of these methods in class –Too much detail –Yacc will take care of it!


Download ppt "Topic #4: Syntactic Analysis (Parsing) EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003."

Similar presentations


Ads by Google