Chapter 3 Chang Chi-Chung 2015.05.18. Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.

Chapter 3 Chang Chi-Chung 2015.05.18

Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol Table getNextToken Rest of Front End

如何表示程式語言的文法 ?  使用 Context Free Grammar ，簡稱 CFG  CFG 比起 Regular Expression 更有威力 (powerful notation than RE)

Context-Free Grammar  Context-free grammar is a 4-tuple G = where  T is a finite set of tokens (terminal symbols)  N is a finite set of nonterminals  P is a finite set of productions of the form    where   N and   (N  T)*  S  N is a designated start symbol

Derivations  The one-step derivation is defined by  A      where A   is a production in the grammar  In addition, we define   is leftmost  lm if  does not contain a nonterminal   is rightmost  rm if  does not contain a nonterminal  Transitive closure  * (zero or more steps)  Positive closure  + (one or more steps)

Example of the Derivations  Leftmost derivation  replaces the leftmost nonterminal (underlined) in each step.  Rightmost derivation  replaces the rightmost nonterminal in each step. list  list + digit  list - digit + digit  digit - digit + digit  9 - digit + digit  9 - 5 + digit  9 - 5 + 2 Production  list  list + digit  list  list – digit  list  digit  digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Example of the Parser Tree  Parse tree of the string 9-5+2 using grammar G list digit 9-5+2 list digit The sequence of leafs is called the yield of the parse tree

Sentence and Language  Sentential form  If S  *  in the grammar G, then  is a sentential form of G  Sentence  A sentential form of G has no nonterminals.  Language  The language generated by G is it ’ s set of sentences.  The language generated by G is defined by L ( G ) = { w  T* | S  * w }  A language that can be generated by a grammar is said to be a Context-Free language.  If two grammars generate the same language, the grammars are said to be equivalent.

Ambiguity  A grammar that produces more than one parse tree for some sentence is said to be ambiguous.  Example  id + id * id E  E + E  id + E  id + E * E  id + id * E  id + id * id E  E * E  E + E * E  id + E * E  id + id * E  id + id * id E → E + E | E * E | ( E ) | id

Example  Consider the following context-free grammar  This grammar is ambiguous, because more than one parse tree represents the string 9-5+2 P = string  string + string | string - string | 0 | 1 | … | 9 G =

Example string 9-5+2 9-5+2

Ambiguity  Dangling-else Grammar stmt  if expr then stmt | if expr then stmt else stmt | other if E 1 then S 1 else if E 2 then S 2 else S 3

Eliminating Ambiguity(2) if E 1 then if E 2 then S 1 else S 2

Parsing  The process of determining if a string of terminals (tokens) can be generated by a grammar.  Time complexity:  For any CFG there is a parser that takes at most O(n 3 ) time to parse a string of n terminals.  Linear algorithms suffice to parse essentially all languages that arise in practice.  Two kinds of methods  Top-down: constructs a parse tree from root to leaves  Bottom-up: constructs a parse tree from leaves to root

兩種語法分析方式  Top-down Parsing  最左推導  不可以有左遞迴  不可以有左因子  明確性文法  Bottom-up Parsing  最右推導  不可以有右遞迴  不可以有右因子  明確性文法 CFG LR(1) LL(1) RG

Notational Conventions  Terminals  a, b, c, …  T  example: 0, 1, +, *, id, if  Nonterminals  A, B, C, …  N  example: expr, term, stmt  Grammar symbols  X, Y, Z  ( N  T )  Strings of terminals  u, v, w, x, y, z  T *  Strings of grammar symbols (sentential form)  , ,   (N  T)*  The head of the first production is the start symbol, unless stated.

Top-down Parsing  recursive-descent parsing  LL(1)  Left-to-right, Leftmost derivation  Creating the nodes of the parse tree in preorder ( depth-first ) Grammar E  T + T T  ( E ) T  - E T  id Leftmost derivation E  lm T + T  lm id + T  lm id + id E E T + T id E TT + E T + T

Recursive Descent Parsing  Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal ’ s syntactic category of input tokens  When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input lookahead information

Recursive Descent Parsing void A() { Choose an A -Production, A  X 1 X 2 …X k ; for (i = 1 to k) { if ( X i is a nonterminal) call procedure Xi(); else if ( X i = current input symbol a ) advance the input to the next symbol; else /* an error has occurred */ }

Conclusion: Parsing and Translation Scheme  Complete void term() throws IOException { if (Character.isDigit((char)lookahead){ System.out.write((char)lookahead); match(lookahead); } else throw new Error(“syntax error”); } void match(int t) throws IOException { if ( lookahead == t ) lookahead = System.in.read(); else throw new Error(“syntax error”); } } import java.io.*; class Parser { static int lookahead; public Parser() throws IOException { lookahead = System.in.read(); } void expr() { term(); while ( true ) { if ( lookahead == ‘+’ ) { match(‘+’); term(); System.out.write(‘+’); continue; } else if (lookahead == ‘-’) { match(‘-’); term(); System.out.write(‘-’); continue; } else return; }

LL(1) Grammar  Predictive parsers, that is, recursive-descent parsers needing no backtracking, can be constructed for a class of grammars called LL(1)  First “L” means the input from left to right.  Second “L” means leftmost derivation.  “1” for using one input symbol of lookahead at each step tp make parsing action decisions.  No left-recursive.  No ambiguous.

FIRST and FOLLOW S α A c γ a β c is in FIRST(A) a is in FOLLOW(A)

FIRST and FOLLOW  The constructed of both top-down and bottom- up parsers is aided by two functions, FIRST and FOLLOW, associated with a grammar G.  During top-down parsing, FIRST and FOLLOW allow us to choose which production to apply.  During panic-mode error recovery, sets of tokens produced by FOLLOW can be used as synchronizing tokens.

FIRST  FIRST(  )  The set of terminals that begin all strings derived from   FIRST (a) = { a } if a  T  FIRST (  ) = {  }  FIRST (A) =  A  FIRST (  ) for A   P  FIRST (X 1 X 2 … X k ) = if   FIRST (X j ) for all j = 1, …, i-1 then add non-  in FIRST(X i ) to FIRST(X 1 X 2 …X k ) if   FIRST (X j ) for all j = 1, …, k then add  to FIRST (X 1 X 2 …X k )

FIRST(1)  By definition of the FIRST, we can compute FIRST(X)  If X  T, then FIRST(X) = {X}.  If X  N, X→ , then add  to FIRST(X).  If X  N, and X → Y 1 Y 2... Y n, then add all non-  elements of FIRST(Y 1 ) to FIRST(X), if  FIRST(Y 1 ), then add all non-  elements of FIRST(Y 2 ) to FIRST(X),..., if  FIRST(Y n ), then add  to FIRST(X).

FOLLOW  FOLLOW( A )  the set of terminals that can immediately follow nonterminal A  FOLLOW(A) = for all (B   A  )  P do add FIRST(  )-{  } to FOLLOW(A) for all (B   A  )  P and   FIRST(  ) do add FOLLOW(B) to FOLLOW(A) for all (B   A)  P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol S then add $ to FOLLOW(A)

FOLLOW(1)  By definition of the FOLLOW, we can compute FOLLOW(X)  Put $ into FOLLOW(S).  For each A   B , add all non-  elements of FIRST(  ) to FOLLOW(B).  For each A   B or A   B , where  FIRST(  ), add all of FOLLOW(A) to FOLLOW(B).

Example  Give a Grammar G E → T E’ E’ → + T E’ | ε T → F T’ T’ → * F T’ | ε F → ( E ) | id FIRST E(id E’ ++ T(id T’ ** F(id FOLLOW E$ ) E’ $ ) T + $ ) T’ + $ ) F * + $ )

Using FIRST and FOLLOW to Write a Recursive Descent Parser expr  term rest rest  + term rest | - term rest |  term  id FIRST(+ term rest) = { + } FIRST(- term rest) = { - } FOLLOW(rest) = { $ } rest() { if (lookahead in FIRST(+ term rest) ) { match(‘+’); term(); rest() } else if (lookahead in FIRST(- term rest) ) { match(‘-’); term(); rest() } else if (lookahead in FOLLOW(rest) ) return else error() }

Chapter 3 Chang Chi-Chung 2015.05.18. Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.

Similar presentations

Presentation on theme: "Chapter 3 Chang Chi-Chung 2015.05.18. Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 3 Chang Chi-Chung 2015.05.18. Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.

Similar presentations

Presentation on theme: "Chapter 3 Chang Chi-Chung 2015.05.18. Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol."— Presentation transcript:

Similar presentations

About project

Feedback