R.Rajkumar Asst.Professor CSE

Slides:



Advertisements
Similar presentations
lec02-parserCFG March 27, 2017 Syntax Analyzer
Advertisements

15-Dec-14 BNF. Metalanguages A metalanguage is a language used to talk about a language (usually a different one) We can use English as its own metalanguage.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
30-Jun-15 BNF. Metalanguages A metalanguage is a language used to talk about a language (usually a different one) We can use English as its own metalanguage.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
Chpater 3. Outline The definition of Syntax The Definition of Semantic Most Common Methods of Describing Syntax.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Bernd Fischer RW713: Compiler and Software Language Engineering.
CPS 506 Comparative Programming Languages Syntax Specification.
Introduction to Compiling
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
BNF A CFL Metalanguage Some Variations Particular View to SLK Copyright © 2015 – Curt Hill.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Introduction to Parsing
CSE 3302 Programming Languages
CS 3304 Comparative Languages
Chapter 3: Describing Syntax and Semantics
Chapter 3 – Describing Syntax
lec02-parserCFG May 8, 2018 Syntax Analyzer
Parsing & Context-Free Grammars
CS 404 Introduction to Compiler Design
CS510 Compiler Lecture 4.
Bottom-Up Parsing.
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Chapter 3 – Describing Syntax
Syntax (1).
Syntax Specification and Analysis
What does it mean? Notes from Robert Sebesta Programming Languages
lec02-parserCFG July 30, 2018 Syntax Analyzer
Syntax Analysis Chapter 4.
Compiler Construction
Syntax versus Semantics
Compiler Construction (CS-636)
CS 363 Comparative Programming Languages
CS416 Compiler Design lec00-outline September 19, 2018
Syntax Analysis Sections :.
CSE 3302 Programming Languages
Parsing & Context-Free Grammars Hal Perkins Autumn 2011
Compiler Design 4. Language Grammars
Lexical and Syntax Analysis
Introduction CI612 Compiler Design CI612 Compiler Design.
Programming Language Syntax 2
COP4020 Programming Languages
Lecture 7: Introduction to Parsing (Syntax Analysis)
Bottom Up Parsing.
CSC 4181Compiler Construction Context-Free Grammars
CS 3304 Comparative Languages
BNF 23-Feb-19.
CS416 Compiler Design lec00-outline February 23, 2019
CSC 4181 Compiler Construction Context-Free Grammars
BNF 9-Apr-19.
High-Level Programming Language
Parsing & Context-Free Grammars Hal Perkins Summer 2004
Chapter 10: Compilers and Language Translation
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
lec02-parserCFG May 27, 2019 Syntax Analyzer
Parsing & Context-Free Grammars Hal Perkins Autumn 2005
Programming Languages 2nd edition Tucker and Noonan
COMPILER CONSTRUCTION
Presentation transcript:

R.Rajkumar Asst.Professor CSE Part 2 Lexical analyzer R.Rajkumar Asst.Professor CSE

Lexical analyzer Lexical analysis, also called scanning, is the phase of the compilation process which deals with the actual program being compiled, character by character. The higher level parts of the compiler will call the lexical analyzer with the command "get the next word from the input", and it is the scanner's job to sort through the input characters and find this word. 

Tokens, Lexeme, Patterns

Regular Expressions A regular expression (sometimes called a rational expression) is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations.

Thompson Construction Algorithm for converting RE to NFA

Thompson’s Transition Diagram: An Example (a | b)*abb ε a 2 3 start a b b 1 6 7 8 9 10 b 4 5

Relation of Lexical analyzer with parser token Source program Lexical analyzer parser Nexttoken() symbol table

lec02-parserCFG December 7, 2018 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This syntactic structure is mostly a parse tree. Syntax Analyzer is also known as parser. The syntax of a programming is described by a context-free grammar (CFG). We will use BNF (Backus-Naur Form) notation in the description of CFGs. The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a context-free grammar or not. If it satisfies, the parser creates the parse tree of that program. Otherwise the parser gives the error messages.

BNF 7-Dec-18

BNF BNF stands for either Backus-Naur Form or Backus Normal Form BNF is a metalanguage used to describe the grammar of a programming language BNF is formal and precise BNF is a notation for context-free grammars BNF is essential in compiler construction

BNF < > indicate a nonterminal that needs to be further expanded, e.g. <variable> Symbols not enclosed in < > are terminals; they represent themselves, e.g. if, while, ( The symbol ::= means is defined as The symbol | means or; it separates alternatives, e.g. <addop> ::= + | -

BNF uses recursion <integer> ::= <digit> | <integer> <digit> or <integer> ::= <digit> | <digit> <integer> Recursion is all that is needed (at least, in a formal sense) "Extended BNF" allows repetition as well as recursion Repetition is usually better when using BNF to construct a compiler

BNF Examples I <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 <if statement> ::= if ( <condition> ) <statement> | if ( <condition> ) <statement> else <statement>

BNF Examples II <unsigned integer> ::= <digit> | <unsigned integer> <digit> <integer> ::= <unsigned integer> | + <unsigned integer> | - <unsigned integer>

BNF Examples III <identifier> ::= <letter> | <identifier> <letter> | <identifier> <digit> <block> ::= { <statement list> } <statement list> ::= <statement> | <statement list> <statement>

BNF Examples IV <statement> ::= <block> | <assignment statement> | <break statement> | <continue statement> | <do statement> | <for loop> | <goto statement> | <if statement> | . . .

Extended BNF The following are pretty standard: [ ] enclose an optional part of the rule Example: <if statement> ::= if ( <condition> ) <statement> [ else <statement> ] { } mean the enclosed can be repeated any number of times (including zero) Example: <parameter list> ::= ( ) | ( { <parameter> , } <parameter> )

Limitations of BNF No easy way to impose length limitations, such as maximum length of variable names No easy way to describe ranges, such as 1 to 31 No way at all to impose distributed requirements, such as, a variable must be declared before it is used Describes only syntax, not semantics

Parser Lexical Parser Analyzer Parser works on a stream of tokens. The smallest item is a token. token Parser source program Lexical Analyzer parse tree get next token

Parse Tree Inner nodes of a parse tree are non-terminal symbols. The leaves of a parse tree are terminal symbols. A parse tree can be seen as a graphical representation of a derivation. Example: E  E + E | E – E | E * E | E / E | - E E  ( E ) E  id E  -E E -  -(E) E - ( ) E + - ( )  -(E+E) E + - ( ) id E id + - ( )  -(id+E)  -(id+id)

Two groups of parser We categorize the parsers into two groups: Top-Down Parser the parse tree is created top to bottom, starting from the root. Bottom-Up Parser the parse is created bottom to top; starting from the leaves Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time). Efficient top-down and bottom-up parsers can be implemented only for sub-classes of context-free grammars. LL for top-down parsing LR for bottom-up parsing

Left-Most and Right-Most Derivations Left-Most Derivation E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) Right-Most Derivation E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) lm lm lm lm lm rm rm rm rm rm

Ambiguity

Ambiguity A grammar produces more than one parse tree for a sentence is called as an ambiguous grammar. Example: E  E + E | E * E E  id E + id * E  E+E  id+E  id+E*E id+id*E  id+id*id E id + * E  E*E E+E*E  id+E*E id+id*E  id+id*id

Unambiguous grammar unique selection of the parse tree for the grammar(Sentence).

Ambiguity rules For the most parsers, the grammar must be unambiguous. We should eliminate the ambiguity in the grammar during the design phase of the compiler. An unambiguous grammar should be written to eliminate the ambiguity. We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to disambiguate that grammar to restrict to this choice.

Which is Ambiguity? 1? 2? BNF : stmt  if expr then stmt | if expr then stmt else stmt | otherstmts if E1 then if E2 then S1 else S2 1? 2?

Ambiguity stmt  matchedstmt | unmatchedstmt We prefer the second parse tree (else matches with closest if). So, we have to disambiguate our grammar to reflect this choice. The disambiguous grammar will be: stmt  matchedstmt | unmatchedstmt matchedstmt  if expr then matchedstmt else matchedstmt | otherstmts unmatchedstmt  if expr then stmt | if expr then matchedstmt else unmatchedstmt

Operator Precedence- Ambiguity Ambiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence and associatively rules. E  E+E | E*E | E^E | id | (E) disambiguate the grammar precedence: ^ (right to left) * (left to right) + (left to right) 

Left Recursion

Left Recursion A grammar is left recursive if it has a non-terminal A such that there is a derivation. A  A for some string  Top-down parsing techniques cannot handle left-recursive grammars. So, we have to convert our left-recursive grammar into an equivalent grammar which is not left-recursive. The left-recursion may appear in a single step of the derivation (immediate left-recursion), or may appear in more than one step of the derivation. +

Immediate Left-Recursion A  A  |  where  does not start with A  eliminate immediate left recursion A   A’ A’   A’ |  an equivalent grammar In general,

In general A  A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A  eliminate immediate left recursion A  1 A’ | ... | n A’ A’  1 A’ | ... | m A’ |  an equivalent grammar

Immediate Left-Recursion E  E+T | T T  T*F | F F  id | (E)

After eliminate immediate left recursion E  T E’ E’  +T E’ |  T  F T’ T’  *F T’ |  F  id | (E)