Presentation is loading. Please wait.

Presentation is loading. Please wait.

Syntax Analyzer (Parser)

Similar presentations


Presentation on theme: "Syntax Analyzer (Parser)"— Presentation transcript:

1 Syntax Analyzer (Parser)

2 Lexical Analyzer and Parser

3 Parser Accepts string of tokens from lexical analyzer (one token at a time) Verifies whether or not string can be generated by grammar Reports syntax errors (recovers if possible)

4 Errors Lexical errors (e.g. misspelled word)
Syntax errors (e.g. unbalanced parentheses, missing semicolon) Semantic errors (e.g. type errors) Logical errors (e.g. infinite loop)

5 Error Handling Report errors clearly and accurately
Recover quickly if possible Poor error recover may lead to bunch of errors Very poor programming style if source code does not handle errors!!

6 Context Free Grammars CFGs can represent recursive constructs that regular expressions can not A CFG consists of: A finite set of terminals (in our case, this will be the set of tokens) A finite set of non-terminals (syntactic-variables) A finite set of productions rules in the following form: A   where A is a non-terminal and  is a string of terminals and non-terminals (including the empty string) A start symbol (one of the non-terminal symbol)

7 Derivations (Part 1) One definition of language: the set of strings that have valid parse trees Another definition: the set of strings that can be derived from the start symbol E  E + E | E * E | (E) | – E | id E => -E (read E derives –E) E => -E => -(E) => -(id) (read E derives –(id))

8 Derivations (Part 2) αAβ => αγβ if A  γ is a production and α and β are arbitrary strings of grammar symbols If a1 => a2 => … => an, we say a1 derives an => means derives in one step *=> means derives in zero or more steps +=> means derives in one or more steps

9 Sentences and Languages
Let L(G) be the language generated by the grammar G with start symbol S: Strings in L(G) may contain only tokens of G A string w is in L(G) if and only if S +=> w Such a string w is a sentence of G Any language that can be generated by a CFG is said to be a context-free language If two grammars generate the same language, they are said to be equivalent

10 Sentential and Sentences
let S *=>  - If  contains non-terminals, it is called as a sentential form of G. - If  does not contain non-terminals, it is called as a sentence of G.

11 Leftmost and rightmost Derivations
Leftmost Derivations -- Only the leftmost nonterminal in any sentential form is replaced at each step If α derives β by a leftmost derivation, then we write α lm*=> β Rightmost Derivations -- Only the rightmost nonterminal in any sentential form is replaced at each step If α derives β by a rightmost derivation, then we write α rm*=> β

12 Example Left-Most Derivation
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) Right-Most Derivation E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) lm lm lm lm lm rm rm rm rm rm

13 Parse Tree A parse tree can be seen as a graphical representation of a derivation. -Inner nodes are non-terminal symbols. -The leaves of a parse tree are terminal symbols. E  -E  -(E) E - ( ) E + - ( ) E -  -(E+E) E + - ( ) id E id + - ( )  -(id+E)  -(id+id)

14 Ambiguity A grammar produces more than one parse tree for a sentence is called as an ambiguous grammar. E  E+E  id+E  id+E*E  id+id*E  id+id*id E + id * E  E*E  E+E*E  id+E*E  id+id*E  id+id*id E id + *

15 Ambiguity (cont.) For the most parsers, the grammar must be unambiguous. unambiguous grammar  unique selection of the parse tree for a sentence We should eliminate the ambiguity in the grammar during the design phase of the compiler. We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to disambiguate that grammar to restrict to this choice.

16 Ambiguity (cont.) 2 1 stmt  if expr then stmt |
if expr then stmt else stmt | otherstmts if E1 then if E2 then S1 else S2 stmt if expr then stmt else stmt E1 if expr then stmt S2 E S1 stmt if expr then stmt E1 if expr then stmt else stmt E S S2 1 2

17 Ambiguity (cont.) We prefer the second parse tree
(else matches with closest if) So, we have to disambiguate our grammar to reflect this choice. The unambiguous grammar will be: stmt  matchedstmt | unmatchedstmt matchedstmt  if expr then matchedstmt else matchedstmt | otherstmt unmatched  if expr then stmt | if expr then matchedstm else unmatchedstm

18 Ambiguity – Operator Precedence
Ambiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence rules. E  E+E | E*E | E^E | id | (E) disambiguate the grammar precedence: ^ (right to left) * (left to right) + (left to right) E  E+T | T T  T*F | F F  G^F | G G  id | (E)

19 Left Recursion A grammar is left recursion if it has a non-terminal A such that there is a derivation A +=> Aα for some string  Most top-down parsing methods can not handle left-recursive grammars

20 Eliminate Left-Recursion
A  A  |  where  does not start with A  eliminate left recursion A   A’ A’   A’ |  an equivalent grammar In general, A  Aα1 | Aα2 | … | Aαm | β1 | β2 | … | βn A  β1A’ | β2A’ | … | βnA’ A’  α1A’ | α2A’ | … | αmA’ | ε

21 Eliminate Left-Recursion -- Example
E  E+T | T T  T*F | F F  id | (E)  eliminate left recursion E  T E’ E’  +T E’ |  T  F T’ T’  *F T’ |  F  id | (E)

22 Left-Recursion -- Problem
S  Aa | b A  Sc | d This grammar is not immediately left-recursive, but it is still left-recursive. So, we have to eliminate all left-recursions from our grammar by following some kind of algorithm!!

23 Eliminate Left-Recursion -- Algorithm
- Arrange non-terminals in some order: A1 ... An - for i from 1 to n do { - for j from 1 to i-1 do { replace each production Ai  Aj  by Ai  1  | ... | k  where Aj  1 | ... | k } // eliminate immediate left-recursions among Ai productions }

24 Eliminate Left-Recursion -- Example
S  Aa | b A  Ac | Sd | f - Replace A  Sd with A  Aad | bd So, we will have A  Ac | Aad | bd | f - Eliminate the immediate left-recursion in A A  bdA’ | fA’ A’  cA’ | adA’ |  So, the resulting equivalent grammar which is not left-recursive is:

25 Eliminate Left-Recursion – Example2
S  Aa | b A  Ac | Sd | f - Eliminate the immediate left-recursion in A A  SdA’ | fA’ A’  cA’ |  - Replace S  Aa with S  SdA’a | fA’a - So, we will have S  SdA’a | fA’a | b - Eliminate the immediate left-recursion in S S  fA’aS’ | bS’ S’  dA’aS’ |  So, the resulting equivalent grammar which is not left-recursive is:

26 Left Factoring stmt  if expr then stmt else stmt | if expr then stmt when we see if, we cannot know which production rule to choose to re-write stmt in the derivation. In general, A  αβ1 | αβ2 A  αA’ A’  β1 | β2

27 Limitations of CFGs There are some language constructions in the programming languages which are not context-free. Example: L1 = {wcw | w is in (a|b)*} CFG can not verify repeated strings Example: L2 = {anbmcndm | n≥1 & m≥1} Can not verify repeated counts Therefore, some checks put off until semantic analysis


Download ppt "Syntax Analyzer (Parser)"

Similar presentations


Ads by Google