R.Rajkumar Asst.Professor CSE

R.Rajkumar Asst.Professor CSE
Part 2 Lexical analyzer R.Rajkumar Asst.Professor CSE

Lexical analyzer Lexical analysis, also called scanning, is the phase of the compilation process which deals with the actual program being compiled, character by character. The higher level parts of the compiler will call the lexical analyzer with the command "get the next word from the input", and it is the scanner's job to sort through the input characters and find this word.

Tokens, Lexeme, Patterns

Regular Expressions A regular expression (sometimes called a rational expression) is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations.

Thompson Construction Algorithm for converting RE to NFA

Thompson’s Transition Diagram: An Example
(a | b)*abb ε a 2 3 start a b b 1 6 7 8 9 10 b 4 5

Relation of Lexical analyzer with parser
token Source program Lexical analyzer parser Nexttoken() symbol table

lec02-parserCFG December 7, 2018 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This syntactic structure is mostly a parse tree. Syntax Analyzer is also known as parser. The syntax of a programming is described by a context-free grammar (CFG). We will use BNF (Backus-Naur Form) notation in the description of CFGs. The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a context-free grammar or not. If it satisfies, the parser creates the parse tree of that program. Otherwise the parser gives the error messages.

BNF 7-Dec-18

BNF BNF stands for either Backus-Naur Form or Backus Normal Form
BNF is a metalanguage used to describe the grammar of a programming language BNF is formal and precise BNF is a notation for context-free grammars BNF is essential in compiler construction

BNF < > indicate a nonterminal that needs to be further expanded, e.g. <variable> Symbols not enclosed in < > are terminals; they represent themselves, e.g. if, while, ( The symbol ::= means is defined as The symbol | means or; it separates alternatives, e.g. <addop> ::= + | -

BNF uses recursion <integer> ::= <digit> | <integer> <digit> or <integer> ::= <digit> | <digit> <integer> Recursion is all that is needed (at least, in a formal sense) "Extended BNF" allows repetition as well as recursion Repetition is usually better when using BNF to construct a compiler

BNF Examples I <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<if statement> ::= if ( <condition> ) <statement> | if ( <condition> ) <statement> else <statement>

BNF Examples II <unsigned integer> ::= <digit> | <unsigned integer> <digit> <integer> ::= <unsigned integer> | + <unsigned integer> | - <unsigned integer>

BNF Examples III <identifier> ::= <letter> | <identifier> <letter> | <identifier> <digit> <block> ::= { <statement list> } <statement list> ::= <statement> | <statement list> <statement>

Extended BNF The following are pretty standard:
[ ] enclose an optional part of the rule Example: <if statement> ::= if ( <condition> ) <statement> [ else <statement> ] { } mean the enclosed can be repeated any number of times (including zero) Example: <parameter list> ::= ( ) | ( { <parameter> , } <parameter> )

Limitations of BNF No easy way to impose length limitations, such as maximum length of variable names No easy way to describe ranges, such as 1 to 31 No way at all to impose distributed requirements, such as, a variable must be declared before it is used Describes only syntax, not semantics

Parser Lexical Parser Analyzer Parser works on a stream of tokens.
The smallest item is a token. token Parser source program Lexical Analyzer parse tree get next token

Parse Tree Inner nodes of a parse tree are non-terminal symbols. The leaves of a parse tree are terminal symbols. A parse tree can be seen as a graphical representation of a derivation. Example: E  E + E | E – E | E * E | E / E | - E E  ( E ) E  id E  -E E -  -(E) E - ( ) E + - ( )  -(E+E) E + - ( ) id E id + - ( )  -(id+E)  -(id+id)

Two groups of parser We categorize the parsers into two groups:
Top-Down Parser the parse tree is created top to bottom, starting from the root. Bottom-Up Parser the parse is created bottom to top; starting from the leaves Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time). Efficient top-down and bottom-up parsers can be implemented only for sub-classes of context-free grammars. LL for top-down parsing LR for bottom-up parsing

Left-Most and Right-Most Derivations
Left-Most Derivation E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) Right-Most Derivation E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) lm lm lm lm lm rm rm rm rm rm

Ambiguity

Ambiguity A grammar produces more than one parse tree for a sentence is called as an ambiguous grammar. Example: E  E + E | E * E E  id E + id * E  E+E  id+E  id+E*E id+id*E  id+id*id E id + * E  E*E E+E*E  id+E*E id+id*E  id+id*id

Unambiguous grammar unique selection of the parse tree for the grammar(Sentence).

Ambiguity rules For the most parsers, the grammar must be unambiguous.
We should eliminate the ambiguity in the grammar during the design phase of the compiler. An unambiguous grammar should be written to eliminate the ambiguity. We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to disambiguate that grammar to restrict to this choice.

Which is Ambiguity? 1? 2? BNF : stmt  if expr then stmt |
if expr then stmt else stmt | otherstmts if E1 then if E2 then S1 else S2 1? 2?

Ambiguity stmt  matchedstmt | unmatchedstmt
We prefer the second parse tree (else matches with closest if). So, we have to disambiguate our grammar to reflect this choice. The disambiguous grammar will be: stmt  matchedstmt | unmatchedstmt matchedstmt  if expr then matchedstmt else matchedstmt | otherstmts unmatchedstmt  if expr then stmt | if expr then matchedstmt else unmatchedstmt

Operator Precedence- Ambiguity
Ambiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence and associatively rules. E  E+E | E*E | E^E | id | (E) disambiguate the grammar precedence: ^ (right to left) * (left to right) + (left to right) 

Left Recursion

Left Recursion A grammar is left recursive if it has a non-terminal A such that there is a derivation. A  A for some string  Top-down parsing techniques cannot handle left-recursive grammars. So, we have to convert our left-recursive grammar into an equivalent grammar which is not left-recursive. The left-recursion may appear in a single step of the derivation (immediate left-recursion), or may appear in more than one step of the derivation. +

Immediate Left-Recursion
A  A  |  where  does not start with A  eliminate immediate left recursion A   A’ A’   A’ |  an equivalent grammar In general,

In general A  A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A  eliminate immediate left recursion A  1 A’ | ... | n A’ A’  1 A’ | ... | m A’ |  an equivalent grammar

Immediate Left-Recursion
E  E+T | T T  T*F | F F  id | (E)

After eliminate immediate left recursion
E  T E’ E’  +T E’ |  T  F T’ T’  *F T’ |  F  id | (E)

R.Rajkumar Asst.Professor CSE

Similar presentations

Presentation on theme: "R.Rajkumar Asst.Professor CSE"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

R.Rajkumar Asst.Professor CSE

Similar presentations

Presentation on theme: "R.Rajkumar Asst.Professor CSE"— Presentation transcript:

Similar presentations

About project

Feedback