1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman (1435-1436)

2 Lexical Analyzer and Parser

3 Parser Accepts string of tokens from lexical analyzer (usually one token at a time) Verifies whether or not string can be generated by grammar Reports syntax errors (recovers if possible)

4 Errors Lexical errors (e.g. misspelled word) Syntax errors (e.g. unbalanced parentheses, missing semicolon) Semantic errors (e.g. type errors) Logical errors (e.g. infinite recursion)

5 Error Handling Report errors clearly and accurately Recover quickly if possible Poor error recover may lead to avalanche of errors

6 Error Recovery Panic mode: discard tokens one at a time until a synchronizing token is found Phrase-level recovery: Perform local correction that allows parsing to continue Error Productions: Augment grammar to handle predicted, common errors Global Production: Use a complex algorithm to compute least-cost sequence of changes leading to parseable code

7 Context Free Grammars CFGs can represent recursive constructs that regular expressions can not A CFG consists of: –Tokens (terminals, symbols) –Nonterminals (syntactic variables denoting sets of strings) –Productions (rules specifying how terminals and nonterminals can combine to form strings) –A start symbol (the set of strings it denotes is the language of the grammar)

8 Derivations (Part 1) One definition of language: the set of strings that have valid parse trees Another definition: the set of strings that can be derived from the start symbol E  E + E | E * E | (E) | – E | id E => -E (read E derives –E ) E => -E => -(E) => -(id)

9 Derivations (Part 2) αAβ => αγβ if A  γ is a production and α and β are arbitrary strings of grammar symbols If a 1 => a 2 => … => a n, we say a 1 derives a n => means derives in one step * => means derives in zero or more steps + => means derives in one or more steps

10 Sentences and Languages Let L(G) be the language generated by the grammar G with start symbol S : –Strings in L(G) may contain only tokens of G –A string w is in L(G) if and only if S + => w –Such a string w is a sentence of G Any language that can be generated by a CFG is said to be a context-free language If two grammars generate the same language, they are said to be equivalent

11 Sentential Forms If S * => α, where α may contain nonterminals, we say that α is a sentential form of G A sentence is a sentential form with no nonterminals

12 Leftmost Derivations Only the leftmost nonterminal in any sentential form is replaced at each step A leftmost step can be written as wAγ lm => wδγ –w consists of only terminals –γ is a string of grammar symbols If α derives β by a leftmost derivation, then we write α lm * => β If S lm * => α then we say that α is a left- sentential form of the grammar Analogous terms exist for rightmost derivations

13 Parse Trees A parse tree can be viewed as a graphical representation of a derivation Every parse tree has a unique leftmost derivation (not true of every sentence) An ambiguous grammars has: –more than one parse tree for at least one sentence –more than one leftmost derivation for at least one sentence

14 Regular Expressions vs. CFGs Every construct that can be described by an RE and also be described by a CFG Why use REs at all? –Lexical rules are simpler to describe this way –REs are often easier to read –More efficient lexical analyzers can be constructed

15 Eliminating Ambiguity (1) stmt  if expr then stmt | if expr then stmt else stmt | other if E 1 then if E 2 then S 1 else S 2

16 Eliminating Ambiguity (2)

17 Eliminating Ambiguity (3) stmt  matched | unmatched matched  if expr then matched else matched | other unmatched  if expr then stmt | if expr then matched else unmatched

18 Left Recursion A grammar is left recursive if for any nonterminal A such that there exists any derivation A + => Aα for any string α Most top-down parsing methods can not handle left-recursive grammars

19 Eliminating Left Recursion (1) A  Aα 1 | Aα 2 | … | Aα m | β 1 | β 2 | … | β n A  β 1 A’ | β 2 A’ | … | β n A’ A’  α 1 A’ | α 2 A’ | … | α m A’ | ε Harder case: S  Aa | b A  Ac | Sd | ε

20 Eliminating Left Recursion (2) First arrange the nonterminals in some order A 1, A 2, … A n Apply the following algorithm: for i = 1 to n { for j = 1 to i-1 { replace each production of the form A i  A j γ by the productions A i  δ 1 γ | δ 2 γ | … | δ k γ, where A j  δ 1 | δ 2 | … | δ k are the A j productions } eliminate the left recursion among A i productions }

21 Left Factoring Rewriting productions to delay decisions Helpful for predictive parsing Not guaranteed to remove ambiguity A  αβ 1 | αβ 2 A  αA’ A’  β 1 | β 2

22 Top Down Parsing Can be viewed two ways: –Attempt to find leftmost derivation for input string –Attempt to create parse tree, starting from at root, creating nodes in preorder General form is recursive descent parsing –May require backtracking –Backtracking parsers not used frequently because not needed

23 Predictive Parsing A special case of recursive-descent parsing that does not require backtracking Must always know which production to use based on current input symbol Can often create appropriate grammar: –removing left-recursion –left factoring the resulting grammar

24 FIRST FIRST(α) is the set of all terminals that begin any string derived from α Computing FIRST : –If X is a terminal, FIRST(X) = {X} –If X  ε is a production, add ε to FIRST(X) –If X is a nonterminal and X  Y 1 Y 2 …Y n is a production: For all terminals a, add a to FIRST(X) if a is a member of any FIRST(Y i ) and ε is a member of FIRST(Y 1 ), FIRST(Y 2 ), … FIRST(Y i-1 ) If ε is a member of FIRST(Y 1 ), FIRST(Y 2 ), … FIRST(Y n ), add ε to FIRST(X)

25 FOLLOW FOLLOW(A), for any nonterminal A, is the set of terminals a that can appear immediately to the right if A in some sentential form More formally, a is in FOLLOW(A) if and only if there exists a derivation of the form S * =>αAaβ $ is in FOLLOW(A) if and only if there exists a derivation of the form S * => αA

26 Computing FOLLOW Place $ in FOLLOW(S) If there is a production A  αBβ, then everything in FIRST(β) (except for ε ) is in FOLLOW(B) If there is a production A  αB, or a production A  αBβ where FIRST(β) contains ε,then everything in FOLLOW(A) is also in FOLLOW(B)

27 E  E + T | T T  T * F | F F  (E) | id We can remove the left recursion: E  T X X  +TX |  T  F Y Y  *FY |  F  (E) | id Left recursion Example

28 E  TX X  +TX |  T  FY Y  *FY |  F  (E) | id First(E) = First(T) = First(F) = { (, id} First(X) = {+,  } First(Y) = {*,  } Follow(E) = Follow(X) = { ), $} Follow(T) = Follow(Y) = {+, ), $} Follow(F) = {+, *, ), $} FIRST and FOLLOW Example

29 Parser After removing the ambiguity and the left recursion, and left factoring. And after getting the first and follow sets we can write the parser as follow: 1-The start symbol is the name of the parser function. 2-For each non terminal there is a function that takes the name of this non terninal. So if we have S   Void S( ) { T(  ); // T is the transformation }

30 T is defined as follow: 1- if “a” is terminal with token a: T(“a”) = match(a) 2- for any non terminal A: T(A) = A( ) 3- T(  1  2 …  n ) = T(  1 ); T(  2 ); …; T(  n ); 3- T(  |  | … |  ) = switch(lookahead) { case First(  ): T(  ); break; case First(  ): T(  ); break; … case First(  ): T(  ); break; default: error(“syntax error”); }

31 5- S   |  | … |  |  switch(lookahead) { case First(  ): T(  ); break; case First(  ): T(  ); break; … case First(  ): T(  ); break; case Follow(S): break; // do nothing default: error(“syntax error”); }

32 Exercise Write the parser of the last grammar with: – plus is the token of ‘+” –mult is the token of ‘*’ –closep is the token of ‘)’ –openp is the token of ‘(‘ –id is the token of identifiers

33 E() { T(); X(); } ________________________ X() { lookahead = lexan(); switch(lookahead){ case plus: match(plus); T(); X(); break; case closep: break; default: error(“syntax error”); } T() { F(); Y(); }________________________ Y() { lookahead = lexan(); switch(lookahead){ case mult: match(plus);F(); Y(); break; case closep: case plus: break; default: error(“syntax error”); }

34 F() { lookahead = lexan(); switch(lookahead){ case openp: match(openp); E(); match(closep); break; case id: match(id); break; default: error(“syntax error”); }

1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman (1435-1436)

Similar presentations

Presentation on theme: "1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman (1435-1436)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman (1435-1436)

Similar presentations

Presentation on theme: "1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman (1435-1436)"— Presentation transcript:

Similar presentations

About project

Feedback