Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.

Slides:



Advertisements
Similar presentations
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Advertisements

Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
Lecture # 7 Chapter 4: Syntax Analysis. What is the job of Syntax Analysis? Syntax Analysis is also called Parsing or Hierarchical Analysis. A Parser.
LESSON 18.
Chapter 4 Lexical and Syntax Analysis Sections 1-4.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Context-Free Grammars Lecture 7
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax.
COP4020 Programming Languages
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
ERROR HANDLING Lecture on 27/08/2013 PPT: 11CS10037 SAHIL ARORA.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 5 Mälardalen University 2005.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
PART I: overview material
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lexical and Syntax Analysis
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Bernd Fischer RW713: Compiler and Software Language Engineering.
UNIT - 2 -Compiled by: Namratha Nayak | Website for Students | VTU - Notes - Question Papers.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
LESSON 04.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Syntax Analyzer (Parser)
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
lec02-parserCFG May 8, 2018 Syntax Analyzer
LESSON 16.
Parsing & Context-Free Grammars
CS510 Compiler Lecture 4.
Lexical and Syntax Analysis
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Syntax Specification and Analysis
Syntax Analysis Chapter 4.
Compiler Construction
Syntax Analysis Sections :.
Syntax Analysis Sections :.
Lexical and Syntax Analysis
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
COMPILER CONSTRUCTION
Presentation transcript:

Overview of Previous Lesson(s)

Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the string of token names can be generated by the grammar for the source language. 3

Over View..  The syntax of programming language constructs can be specified by CFG.  A grammar gives a precise syntactic specification of a programming language.  Universal parsing methods can parse any grammar. These methods are, however, too inefficient to use in production compilers. 4

Over View..  The commonly used parsing methods in compilers is either top- down or bottom-up.  Top-down methods build parse trees from the top (root) to the bottom (leaves), while bottom-up methods start from the leaves and work their way up to the root. 5

Over View...  Programming errors can occur at many different levels & can be categorized as:  Lexical errors include misspellings of identifiers, keywords, or operators - e.g., the use of an identifier elipseSize instead of ellipseSize – and missing quotes around text intended as a string.  Semantic errors include type mismatches between operators and operands. An example is a return statement in a Java method with result type void. 6

Over View..  Syntactic errors include misplaced semicolons or extra or missing braces, that is, "{" or "}" As another example, in C or Java, the appearance of a case statement without an enclosing switch is a syntactic error.  Logical errors can be anything from incorrect reasoning on the part of the programmer to the use in a C program of the assignment operator = instead of the comparison operator ==. 7

Over View...  The error handler in a parser has goals that are simple to state but challenging to realize:  Report the presence of errors clearly and accurately.  Recover from each error quickly enough to detect subsequent errors.  Add minimal overhead to the processing of correct programs. 8

Over View...  Trivial Approach: No Recovery  Print an error message when parsing cannot continue and then terminate parsing.  Panic-Mode Recovery  The parser discards input until it encounters a synchronizing token.  Phrase-Level Recovery  Locally replace some prefix of the remaining input by some string. Simple cases are exchanging ; with, and = with ==.  Error Productions  Include productions for common errors.  Global Correction  Change the input I to the closest correct input I' and produce the parse tree for I'. 9

Over View...  Grammars used to systematically describe the syntax of programming language constructs like expressions and statements. stmt --> if ( expr ) stmt else stmt  A syntactic variable stmt is used to denote statements and variable expr to denote expressions.  Other productions then define precisely what an expr is and what else a stmt can be.  A language generated by a grammar is called a context free language 10

Over View...  Grammar  Terminals: id + - * / ( )  Non-Terminals:expression, term, factor  Start Symbol:expression 11

12

Contents  Context-Free Grammars  Formal Definition of a CFG  Notational Conventions  Derivations  Parse Trees and Derivations  Ambiguity  Verifying the Language Generated by a Grammar  Context-Free Grammars Vs Regular Expressions  Writing a Grammar  Lexical Vs Syntactic Analysis  Eliminating Ambiguity  Elimination of Left Recursion 13

Parse Tree & Derivations  A parse tree is a graphical representation of a derivation that filters out the order in which productions are applied to replace non- terminals.  Each interior node of a parse tree represents the application of a production.  The interior node is labeled with the non-terminal A in the head of the production.  The children of the node are labeled, from left to right, by the symbols in the body of the production by which this A was replaced during the derivation. 14

Parse Tree & Derivations..  Ex:-(id + id)  The leaves of a parse tree are labeled by non-terminals or terminals and, read from left to right constitute a sentential form, called the yield or frontier of the tree. 15

Parse Tree & Derivations…  A derivation starting with a single non-terminal, A ⇒ α 1 ⇒ α 2... ⇒ α n It is easy to write a parse tree with A as the root and α n as the leaves.  The LHS of each production is a non-terminal in the frontier of the current tree so replace it with the RHS to get the next tree.  There can be many derivations that wind up with the same final tree.  But for any parse tree there is a unique leftmost derivation the produces that tree.  Similarly, there is a unique rightmost derivation that produces the tree. 16

Ambiguity  A grammar that produces more than one parse tree for some sentence is said to be ambiguous.  Alternatively, an ambiguous grammar is one that produces more than one leftmost derivation or more than one rightmost derivation for the same sentence.  Ex Grammar E → E + E | E * E | ( E ) | id  It is ambiguous because we have seen two parse trees for id + id * id 17

Ambiguity..  There must be at least two leftmost derivations.  So two parse trees are 18

Language Verification  A proof that a grammar G generates a language L has two parts:  Show that every string generated by G is in L  Show that every string in L can indeed be generated by G.  Ex GrammarS → ( S ) S | ɛ  Apparently this simple grammar generates all strings of balanced parentheses, and only such strings. 19

Language Verification..  To show that every sentence derivable from S is balanced, we use an inductive proof on the number of steps n in a derivation. BASIS: The basis is n = 1 The only string of terminals derivable from S in one step is the empty string, which surely is balanced. INDUCTION: Now assume that all derivations of fewer than n steps produce balanced sentences, and consider a leftmost derivation of exactly n steps. 20

Language Verification...  Such a derivation must be of the form  The derivations of x and y from S take fewer than n steps, so by the inductive hypothesis x and y are balanced. Therefore, the string (x)y must be balanced.  That is, it has an equal number of left and right parentheses, and every prefix has at least as many left parentheses as right. 21

Language Verification...  Now we show that every balanced string is derivable from S  To do so, we use induction on the length of a string. BASIS: If the string is of length 0, it must be ɛ, which is balanced. INDUCTION: First, observe that every balanced string has even length. Assume that every balanced string of length less than 2n is derivable from S. Consider a balanced string w of length 2n, n ≥ 1 22

Language Verification...  Surely w begins with a left parenthesis. Let (x) be the shortest nonempty prefix of w having an equal number of left and right parentheses.  Then w can be written as w = (x)y where both x and y are balanced.  Since x and y are of length less than 2n, they are derivable from S by the inductive hypothesis.  Thus, we can find a following derivation proving that w = (x)y is also derivable from S 23

CFG Vs RE  Every construct that can be described by a regular expression can be described by a grammar, but not vice-versa.  Alternatively, every regular language is a context-free language, but not vice-versa.  Consider RE (a|b)* abb & the grammar  We can construct mechanically a grammar to recognize the same language as a nondeterministic finite automaton (NFA). 24

CFG Vs RE..  The defined grammar above was constructed from the NFA using the following construction 1.For each state i of the NFA, create a non-terminal A i. 2.If state i has a transition to state j on input a add the production A i → a Aj If state i goes to state j on input ɛ add the production A i → Aj 3.If i is an accepting state, add A i → ɛ 4.If i is the start state, make A i be the start symbol of the grammar. 25

Lexical Vs Syntactic Analysis  Why use regular expressions to define the lexical syntax of a language?  Reasons:  Separating the syntactic structure of a language into lexical and non- lexical parts provides a convenient way of modularizing the front end of a compiler into two manageable-sized components.  The lexical rules of a language are frequently quite simple, and to describe them we do not need a notation as powerful as grammars. 26

Lexical Vs Syntactic Analysis..  Regular expressions generally provide a more concise and easier-to- understand notation for tokens than grammars.  More efficient lexical analyzers can be constructed automatically from regular expressions than from arbitrary grammars.  Regular expressions are most useful for describing the structure of constructs such as identifiers, constants, keywords, and white space 27

Lexical Vs Syntactic Analysis..  Grammars, on the other hand, are most useful for describing nested structures such as balanced parentheses, matching begin- end's, corresponding if-then-else's, and so on.  These nested structures cannot be described by regular expressions. 28

Eliminating Ambiguity  An ambiguous grammar can be rewritten to eliminate the ambiguity.  Ex. Eliminating the ambiguity from the following dangling-else grammar:  Compound conditional statement if E 1 then S 1 else if E 2 then S 2 else S 3 29

Eliminating Ambiguity..  Parse tree for this compound conditional statement:  This Grammar is ambiguous since the following string has the two parse trees: if E 1 then if E 2 then S 1 else S 2 30

Eliminating Ambiguity… 31

Eliminating Ambiguity…  We can rewrite the dangling-else grammar with the idea:  A statement appearing between a then and an else must be matched that is, the interior statement must not end with an unmatched or open then.  A matched statement is either an if-then-else statement containing no open statements or it is any other kind of unconditional statement. 32

Thank You