Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cse321, Programming Languages and Compilers 1 6/18/2015 Lecture #7, Feb. 5, 2007 Grammars Top down parsing Transition Diagrams Ambiguity Left recursion.

Similar presentations


Presentation on theme: "Cse321, Programming Languages and Compilers 1 6/18/2015 Lecture #7, Feb. 5, 2007 Grammars Top down parsing Transition Diagrams Ambiguity Left recursion."— Presentation transcript:

1 Cse321, Programming Languages and Compilers 1 6/18/2015 Lecture #7, Feb. 5, 2007 Grammars Top down parsing Transition Diagrams Ambiguity Left recursion Refactoring by adding levels Recursive descent parsing Predictive Parsers First and Follow Parsing tables

2 Cse321, Programming Languages and Compilers 2 6/18/2015 Assignments Reading –Chapter 3 –Page 73-106 –Quiz on Wednesday Mid Term exam –Monday. Feb 19, 2007. Time: in class. Next Homework –On the web page, and the last page of this handout –Due date to be negotiated. –Recall Project #1 is due next Wednesday. –I promised no homework Today or Wednesday. Project 1 –Recall Project #1, the scanner is due Feb. 14th

3 Cse321, Programming Languages and Compilers 3 6/18/2015 Grammar –A set of tokens (terminals): T –A set of non-terminals: N –A set of productions { lhs ::= rhs,... } »lhs in N »rhs is a sequence of N U T –A Start symbol: S (in N) Shorthands –Provide only the productions »All lhs symbols comprise N »All other sysmbols comprise T »lhs of first production is S Grammars 1

4 Cse321, Programming Languages and Compilers 4 6/18/2015 Grammars 2 Rewriting rules –Pick a non-terminal to replace. Which order? »left-to-right »right-to-left Derivations (a list if productions used to derive a string from a grammar). A sentence of G: L(G) –Start with S –only terminal symbols –all strings derivable from G in 1 or more steps

5 Cse321, Programming Languages and Compilers 5 6/18/2015 Grammars 3 Parse trees. – Graphical representations of derivations. –The leaves of a parse tree for fully filled out tree is a sentence. Context Free Grammars –how do they compare to regular expressions? –Nesting (matched ()’s) requires CFG,’s RE's are not powerful enough. Ambiguity –A string has two derivations –E ::= E + E | E * E | id » x + x * y Left-recursion –E ::= E + E | E * E | id –Makes certain top-down parsers loop

6 Cse321, Programming Languages and Compilers 6 6/18/2015 Top Down Parsing Begin with the start symbol and try and derive the parse tree from the root. Consider the grammar Exp ::= id | Exp + Exp | Exp * Exp | ( Exp ) derives x, x+x, x+x+x, x * y x + y * z...

7 Cse321, Programming Languages and Compilers 7 6/18/2015 Example Parse (top down) –stack input Exp x + y * z / | \ Exp + Exp Exp y * z / | \ Exp + Exp | id(x)

8 Cse321, Programming Languages and Compilers 8 6/18/2015 Top Down Parse (cont) Exp y * z / | \ Exp + Exp | / | \ id(x) Exp * Exp Exp z / | \ Exp + Exp | / | \ id(x) Exp * Exp | id(y)

9 Cse321, Programming Languages and Compilers 9 6/18/2015 Top Down Parse (cont.) Exp / | \ Exp + Exp | / | \ id(x) Exp * Exp | | id(y) id(z)

10 Cse321, Programming Languages and Compilers 10 6/18/2015 Transition Diagrams Transition diagrams for predictive parsers – One diagram for each Non-terminal – Shouldn't have left recursion ( left factored ) – Diagrams can (recursively) mention each other E -> T E' E' -> + T E' | T -> F T' T' -> * F T' | F -> ( E ) | id T E’ E F T’ T TE’ + FT’ * E F ( ) id

11 Cse321, Programming Languages and Compilers 11 6/18/2015 Problems with Top Down Parsing Backtracking may be necessary: –S ::= ee | bAc | bAe –A ::= d | cA try on string “bcde” Infinite loops possible from (indirect) left recursive grammars. – E ::= E + id | id Ambiguity is a problem when a unique parse is not possible. These often require extensive grammar restructuring (grammar debugging).

12 Cse321, Programming Languages and Compilers 12 6/18/2015 Grammar Transformations Removing ambiguity. Removing Left Recursion Backtracking and Factoring

13 Cse321, Programming Languages and Compilers 13 6/18/2015 Removing ambiguity. Adding levels to a grammar –E := E + E | E * E | id | ( E ) –E ::= E + T | T –T ::= T * F | F –F ::= id | ( E ) The dangling else grammar. –st ::= if exp then st else st | if exp then st | id := exp – Note that the following has two possible parses if x=2 then if x=3 then y:=2 else y := 4 if x=2 then (if x=3 then y:=2 ) else y := 4 if x=2 then (if x=3 then y:=2 else y := 4)

14 Cse321, Programming Languages and Compilers 14 6/18/2015 Adding levels (cont) Original grammar st ::= if exp then st else st | if exp then st | id := exp Assume that every st between then and else must be matched, i.e. it must have both a then and an else. New Grammar with addtional levels st -> match | unmatch match -> if exp then match else match | id := exp unmatch -> if exp then st | if exp then match else unmatch

15 Cse321, Programming Languages and Compilers 15 6/18/2015 Removing Left Recursion Top down recursive descent parsers require non-left recursive grammars Technique: Left Factoring »E := E + E | E * E | id »E ::= id E’ »E’ ::= + E E’ | * E E’ | ε General Technique to remove direct left recursion –Every Non terminal with productions »T ::= T n | T m (left recursive productions) | a | b (non-left recursive productions) –Make a new non-terminal T’ –Remove the old productions –Add the following productions » T ::= a T’ | b T’ » T’ ::= n T’ | m T’ | ε T Tn Tn T n a (a | b) (n | m) * “a” and “b” because they are the rhs of the non-left recurive productions.

16 Cse321, Programming Languages and Compilers 16 6/18/2015 Backtracking and Factoring Backtracking may be necessary: –S ::= ee | bAc | bAe –A ::= d | cA try on string “bcde” S -> bAc (by S -> bAc) -> bcAe (by A -> cA) -> bcde (by A -> d) But this is the wrong answer! Factoring a grammar –Factor common prefixes and make the different postfixes into a new non-terminal –S ::= ee | bAQ –Q ::= c | e –A ::= d | cA

17 Cse321, Programming Languages and Compilers 17 6/18/2015 Recursive Descent Parsing One procedure (function) for each non-terminal. Procedures are often (mutually) recursive. They can return a bool (the input matches that non- terminal) or more often they return a data-structure (the input builds this parse tree) Need to control the lexical analyzer (requiring it to “back-up” on occasion)

18 Cse321, Programming Languages and Compilers 18 6/18/2015 R.D. parser for R.E.’s Build an instance of the datatype: datatype Re = empty of int | simple of string * int | concat of Re * Re | closure of Re | union of Re * Re; The lexical Analyzer datatype datatype token = Done | Bar | Star | Hash | Leftparen | Rightparen | Single of string;

19 Cse321, Programming Languages and Compilers 19 6/18/2015 Ambiguous grammar 1.RE -> RE bar RE 2.RE -> RE RE 3.RE -> RE * 4.RE -> id 5.RE -> # 6.RE -> ( RE ) Alt -> Alt bar Concat Alt -> Concat Concat -> Concat Closure Concat -> Closure Closure -> simple star Closure -> simple simple -> id | (Alt ) | # Transform grammar by layering Tightest binding operators (*) at the lowest layer Layers are Alt, then Concat, then Closure, then Simple.

20 Cse321, Programming Languages and Compilers 20 6/18/2015 Left Recursive Grammar Alt -> Alt bar Concat Alt -> Concat Concat -> Concat Closure Concat -> Closure Closure -> simple star Closure -> simple simple -> id | (Alt ) | # Alt ::= Concat moreAlt moreAlt ::= Bar Concat moreAlt | ε Concat ::= Closure moreConcat moreConcat ::= Closure moreConcat | ε Closure ::= Simple Star | Simple Simple ::= Id | ( Alt ) | # 1. For every Non terminal with productions T ::= T n | T m (left recursive productions) | a | b (non-left recursive productions) 1. Make a new non-terminal T’ 2. Remove the old productions 3.Add the following productions T ::= a T’ | b T’ T’ ::= n T’ | m T’ | ε

21 Cse321, Programming Languages and Compilers 21 6/18/2015 Lookahead and the Lexer val lookahead = ref Done; val input = ref [Done]; val location = ref 0; fun nextloc () = (location := (!location) + 1; !location) fun init s = (location := 0; input := lexan s; lookahead := hd(!input); input := tl(!input)) Lex’s the whole input inputStores it in the variable input Keeps track of next token (so that backup is possible)

22 Cse321, Programming Languages and Compilers 22 6/18/2015 Matching a single Terminal fun match t = if (!lookahead) = t then (if null(!input) then lookahead := Done else (lookahead := hd(!input) ; input := tl(!input))) else error ("looking for: "^(tok2str t)^ " found: "^(tok2str (!lookahead))); Match one token Advance the input Handle the end of file correctly Report errors in a sensible way This function will be called a lot!!

23 Cse321, Programming Languages and Compilers 23 6/18/2015 moreAlt and moreConcat When we removed left recursion, we added non- terminals that that might recognize ε. i.e. moreAlt and moreConcat Observe the shape of parse trees using those productions. moreConcat ::= Closure moreConcat | moreConcat closure moreConcat... closure moreConcat ε... They always end in ε at the far right of the tree

24 Cse321, Programming Languages and Compilers 24 6/18/2015 1 Function for each NT A simple way to write a parser for a language is the technique called recursive descent parsing. Each Non-terminal is represented by a function that returns a syntax item corresponding to the element that production parses. If it can’t parse that element it raises an error. When a production might return the empty string we need to handle that by using the SML ‘a option datatype. Alt : unit -> Re moreAlt : unit -> Re option Concat : unit -> Re Closure : unit -> Re moreConcat : unit -> Re option Simple : unit -> Re Alt ::= Concat moreAlt moreAlt ::= Bar Alt moreAlt | ε Concat ::= Closure moreConcat moreConcat ::= Closure moreConcat | ε Closure ::= Simple Star | Simple Simple ::= Id | ( Alt ) | #

25 Cse321, Programming Languages and Compilers 25 6/18/2015 Alt ::= Concat moreAlt fun Alt () = let val x = Concat () val y = moreAlt () in case y of NONE => x | SOME z => union(x,z) end

26 Cse321, Programming Languages and Compilers 26 6/18/2015 moreAlt ::= Bar Alt moreAlt | ε and moreAlt () = case (!lookahead) of Bar => let val _ = match Bar val x = Alt() val y = moreAlt () in case y of NONE => SOME x | (SOME z) => SOME(union(x,z)) end | _ => NONE “and” separates mutually recursive functions

27 Cse321, Programming Languages and Compilers 27 6/18/2015 Concat ::= Closure moreConcat and Concat () = let val x = Closure () val y = moreConcat () in case y of NONE => x | SOME z => concat(x,z) end

28 Cse321, Programming Languages and Compilers 28 6/18/2015 moreConcat ::= Closure moreConcat | ε and moreConcat () = case (!lookahead) of (Single _ | Leftparen | Hash) => let val x = Closure() val y = moreConcat() in case y of NONE => SOME x | SOME z => SOME(concat(x,z)) end | _ => NONE

29 Cse321, Programming Languages and Compilers 29 6/18/2015 Closure ::= Simple Star | Simple and Closure () = let val x = Simple() in case !lookahead of Star => (match Star; closure x) | other => x end A simple form of left- factoring is used here

30 Cse321, Programming Languages and Compilers 30 6/18/2015 Simple ::= Id | ( Alt ) | # and Simple () = case !lookahead of Single c => let val _ = match (Single c) val n = nextloc() in simple(c,n) end | Leftparen => let val _ = match Leftparen val x = Alt(); val _ = match Rightparen in x end | Hash => let val _ = match Hash val n = nextloc() in empty n end | x => error ("In Simple no match: "^(tok2str x));

31 Cse321, Programming Languages and Compilers 31 6/18/2015 Top Level Parser fun parse s = let val _ = init s val ans = Alt() val _ = match Done in ans end; parse "a(b*|c)#"; concat (simple ("a",1), concat (union (closure (simple ("b",2)), simple ("c",4)), empty 8))

32 Cse321, Programming Languages and Compilers 32 6/18/2015 Predictive Parsers Using a stack to avoid recursion. Encoding the diagrams in a table The Nullable, First, and Follow functions –Nullable: Can a symbol derive the empty string. False for every terminal symbol. –First: all the terminals that a non-terminal could possibly derive as its first symbol. » term or nonterm -> set( term ) »sequence(term + nonterm) -> set( term) –Follow: all the terminals that could immediately follow the string derived from a non-terminal. »non-term -> set( term )

33 Cse321, Programming Languages and Compilers 33 6/18/2015 Example First and Follow Sets E ::= T E' $ E' ::= + T E' E’ ::= ε T ::= F T' T' ::= * F T' T’ ::= ε F ::= ( E ) F ::= id First E = { "(", "id"} Follow E = {")","$"} First F = { "(", "id"} Follow F = {"+","*",”)”,"$"} First T = { "(", "id"} Follow T = {{"+",")","$"} First E' = { "+", ε} Follow E' = {")","$"} First T' = { "*", ε} Follow T' = {"+",")","$"} First of a terminal is itself. First can be extended to sequence of symbols.

34 Cse321, Programming Languages and Compilers 34 6/18/2015 Nullable if ε is in First(symbol) then that symbol is nullable. Sometime rather than let ε be a symbol we derive an additional function nullable. Nullable (E’) = true Nullable(T’) = true Nullable for all other symbols is false E ::= T E' $ E' ::= + T E' E’ ::= ε T ::= F T' T' ::= * F T' T’ ::= ε F ::= ( E ) F ::= id

35 Cse321, Programming Languages and Compilers 35 6/18/2015 Computing First Use the following rules until no more terminals can be added to any FIRST set. 1) if X is a term. FIRST(X) = {X} X ::= ε 2) if X ::= ε is a production then add ε to FIRST(X), (Or set nullable of X to true). 3) if X is a non-term and –X ::= Y1 Y2... Yk –add a to FIRST(X) »if a in FIRST(Yi) and »for all j<i ε in FIRST(Yj) E.g.. if Y1 can derive ε then if a is in FIRST(Y2) it is surely in FIRST(X) as well.

36 Cse321, Programming Languages and Compilers 36 6/18/2015 Example First Computation Terminals –First($) = {$} First(*) = {*} First(+) = {+}... Empty Productions –add ε to First(E’), add ε to First(T’) Other NonTerminals –Computing from the lowest layer (F) up »First(F) = {id, ( } »First(T’) = { ε, * } »First(T) = First(F) = {id, ( } »First(E’) = { ε, + } »First(E) = First(T) = {id, ( } E ::= T E' $ E' ::= + T E' E’ ::= ε T ::= F T' T' ::= * F T' T’ ::= ε F ::= ( E ) F ::= id

37 Cse321, Programming Languages and Compilers 37 6/18/2015 Computing Follow Use the following rules until nothing can be added to any follow set. 1) Place $ (the end of input marker) in FOLLOW(S) where S is the start symbol. 2) If A ::= a B b then everything in FIRST(b) except ε is in FOLLOW(B) 3) If there is a production A ::= a B or A ::- a B b where FIRST(b) contains ε (i.e. b can derive the empty string) then everything in FOLLOW(A) is in FOLLOW(B)

38 Cse321, Programming Languages and Compilers 38 6/18/2015 Ex. Follow Computation Rule 1, Start symbol – Add $ to Follow(E) Rule 2, Productions with embedded nonterms – Add First( ) ) = { ) } to follow(E) –Add First($) = { $ } to Follow(E’) –Add First(E’) = {+,ε } to Follow(T) –Add First(T’) = {*,ε} to Follow(F) Rule 3, Nonterm in last position –Add follow(E’) to follow(E’) (doesn’t do much) –Add follow (T) to follow(T’) –Add follow(T) to follow(F) since T’ --> ε –Add follow(T’) to follow(F) since T’ --> ε E ::= T E' $ E' ::= + T E' E’ ::= ε T ::= F T ' T' ::= * F T' T’ ::= ε F ::= ( E ) F ::= id

39 Cse321, Programming Languages and Compilers 39 6/18/2015 1.For each production A -> alpha do 2 & 3 2.For each a in First alpha do add A -> alpha to M[A,a] 3.if ε is in First alpha, add A -> alpha to M[A,b] for each terminal b in Follow A. If ε is in First alpha and $ is in Follow A add A - > alpha to M[A,$]. First E = {"(","id"} Follow E = {")","$"} First F = {"(","id"} Follow F = {"+","*",”)”,"$"} First T = {"(","id"} Follow T = {{"+",")","$"} First E' = {"+",ε} Follow E' = {")","$"} First T' = {"*",ε} Follow T' = {"+",")","$"} Table from First and Follow M[A,t] terminals + * ) ( id $ nontermsnonterms 1 23 44 656 78 E E’ T T’ F 1 3 6 1 E ::= T E' $ 2 E' ::= + T E' 3 | ε 4 T ::= F T' 5 T' ::= * F T' 6 | ε 7 F ::= ( E ) 8 | id

40 Cse321, Programming Languages and Compilers 40 6/18/2015 Predictive Parsing Table E E’ T T’ F id + * () $ T E’ + T E’ε ε F T’ ε * F T’εε id( E )

41 Cse321, Programming Languages and Compilers 41 6/18/2015 Table Driven Algorithm push start symbol Repeat begin let X top of stack, A next input if terminal(X) then if X=A then pop X; remove A else error() else (* nonterminal(X) *) begin if M[X,A] = Y1 Y2... Yk then pop X; push Yk YK-1... Y1 else error() end until stack is empty, input = $

42 Cse321, Programming Languages and Compilers 42 6/18/2015 Example Parse Stack Input E x + y $ E’ T x + y $ E’ T’ F x + y $ E’ T’ id x + y $ E’ T’ + y $ E’ + y $ E’ T + + y $ E’ T y $ E’ T’ F y $ E’ T’ id y $ E’ T’ $ E’ $ $ E E’ T T’ F id + * () $ T E’ + T E’ε ε F T’ ε * F T’εε id( E )

43 Cse321, Programming Languages and Compilers 43 6/18/2015 CS321 Prog Lang & Compilers Assignment # 7 Assigned: Feb 5, 2007 Due: Wed. Feb 14, 2007 Cut and paste the following into your solution file. ================================================================== datatype RE = Empty | Union of RE * RE | Concat of RE * RE | Star of RE | C of char; ================================================================ The purpose of today's home work is to write functions analogous to "first" and "follow" for context free grammars from todays lecture. There are two differences, The functions you will write for homework are for regular expressions, not context free grammars, and since REs don't have non-terminal and terminal symbols, the functions are for a complete RE rather than a symbol. Write 3 ML functions 1) Write (nullable: RE -> boolean) Returns a boolean, true if the empty string is a member of the set of strings recognized by the RE, false otherwise. 2) Write (first: RE -> char list) Returns a (char list) which contains those characters which may appear as the first character in the strings recognized by that RE. 3) Write (last: RE -> char list). Returns a char list which contains those characters which may appear as the last character in the strings recognized by that RE. All these functions a simple functions defined with pattern matching. One clause for each constructor of RE.


Download ppt "Cse321, Programming Languages and Compilers 1 6/18/2015 Lecture #7, Feb. 5, 2007 Grammars Top down parsing Transition Diagrams Ambiguity Left recursion."

Similar presentations


Ads by Google