COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.

Slides:



Advertisements
Similar presentations
lec02-parserCFG March 27, 2017 Syntax Analyzer
Advertisements

Compiler Construction
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Context-Free Grammars Lecture 7
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
COP4020 Programming Languages
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 5 Mälardalen University 2005.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Syntax and Semantics Structure of programming languages.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 5 Mälardalen University 2010.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Chapter 5 Context-Free Grammars
Grammars CPSC 5135.
PART I: overview material
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Syntax and Semantics Structure of programming languages.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Introduction to Parsing
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
11 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 7 School of Innovation, Design and Engineering Mälardalen University 2012.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Syntax and Semantics Structure of programming languages.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
lec02-parserCFG May 8, 2018 Syntax Analyzer
Programming Languages Translator
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compiler Construction
Bottom-Up Syntax Analysis
Lexical and Syntax Analysis
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
BNF 9-Apr-19.
lec02-parserCFG May 27, 2019 Syntax Analyzer
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.

Overview of the Subject (COMP 3438) Overview of Unix Sys. Prog. ProcessFile System Overview of Device Driver Development Character Device Driver Development Introduction to Block Device Driver Overview of Complier Design Lexical Analysis Syntax Analysis (HW #4) Part I: Unix System Programming (Device Driver Development) Part II: Compiler Design Course Organization (This lecture is in red)

Review for the Previous Lecture NFA Regular expressions DFA (Deterministic Finite Automata) Lexical Specification Table-driven Implementation of DFA (Lexical Analyzer) (Nondeterministic Finite Automata) Obtain regular expressions Conversion

Outline Part I: Introduction to Syntax Analysis 1. Input (Tokens) and Output (Parse Tree) 2. How to specify syntax? Context Free Grammar (CFG) 3. How to obtain parse tree? CFG  Remove left recursion, left factoring, ambiguity  LL (Leftmost Derivation) CFG  (Remove ambiguity)  LR (Reverse Rightmost Derivation) Part II: Context Free Grammar, Parse Tree and Ambiguity Part III: Bottom-up Paring (LR) SLR, Canonical LR, LALR Part III: Top-down Parsing (LL) Left Recursion, Left factoring (Tutorial) Recursive-Decent Paring Predictive Parsing (without backtracking) –HW3 Nonrecursive Predictive Parsing Software Tool: yacc (Lab)

Part I: Intro. to Syntax Analysis

The Phases of a Compiler Source program Lexical Analyzer Syntax Analyzer (Parser) Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator Target program Symbol-table Manager Error Handler tokens Parse tree

Syntax Analysis (Parsing) – 2 nd Phase Input: sequence of tokens from lexical analysis Output: a parse tree (Syntax Tree) based on the grammar of a programming language Comparison: PhaseInputOutput Lexical Analysis (Scanner) Source Program (String of characters) String of tokens Syntax Analysis (Parser) String of tokensParse/Syntax Tree

Example Code if x ==y then 1 else 2 fi Parser input Set of tokens IF ID==ID THEN NUM ELSE NUM FI Parser output IF-THEN-ELSE ==NUM NUMID

Syntax and Grammar Programming language has rules to prescribe the syntax of programs. In pascal, program  block; block  statements; … The syntax of programming language constructs can be described by context-free grammar or BNF (Backus-Naur Form) var a,b,c; begin a = b + c; end

10 Context-Free Grammar (CFG) A context-free grammar G = (N, T, S, P ) is: (1) N is a finite set of nonterminal symbols (2) T is a finite set of terminal symbols (3) S is the start or initial nonterminal symbol ( S  N). (4) P is a finite set of productions (rules). Every production in P is of the form: A   where A  N and  is a string over (N  T)*. For example: G={ N={S}, T={a,b}, S, P={S  aSb|ab} } S  aSb | ab denotes the language L= {a b | n >0}. n n

Why not Regular Expression? An example (x+y) * z ((x+y) + y) * z ((x+y)+y)+y)*z … (…(((x+y)+y))…) * z How do we know left and right parentheses are matching? The number_of “(” = = the number_of “)” L={a b | n >0} is not a regular set. But it is context free: S  a S b | ab nn

Regular Express, CFG and Automata Regular Expression Finite Automata CFG Pushdown Automata (with a stack) Finite Automaton i =a * b Pushdown Automaton a aabb stack With a stack, it is easy to identify the language like L={a b | n > 0}. n n

Language Classification Recursive Language Context-sensitive Language Context-free Language Regular Set

14 Regular expression vs. CFG Every language that can be described by a regular expression can also be described by a CFG grammar, e.g. for (a |b)*abb, the following CFG gives the same language A 0  a A 0 | bA 0 | aA 1 A 1  bA 2 A 2  bA 3 A3  A3   0123 a b abb Convert a NFA into a CFG: 1. For each state i, create a non-terminal Ai 2. If state i has a transition to state j on input a, introduce Ai  aAj 3. If state i goes to state j on input , add Ai  Aj 4. If i is an accepting state, introduce Ai  

15 Derivations Based on a grammar for a language, we can generate sentences (strings) of the language. This is done by derivations. Syntax Analysis: given the input token string, can we obtain a derivation based on the grammar? Grammar: S  a S b | ab Derivation : S  a S b  aabb where:  to mean “derives in one step”;

16 The derivation of a sentence can be shown pictorially by a parse tree. each node is labeled by a grammar symbol an interior node and its children correspond to a production A Language is: all the sentences that can be derived from the start symbol by the grammar. Parse Trees Example: Grammar: S  a S b | ab Derivation: S  a S b  aabb Parse tree: S a S b ab

17 Leftmost/Rightmost Derivation At each derivation step, we have to make two choices: Which non-terminal to replace? Using which alternative to replace that non-terminal? Leftmost Derivation: only the leftmost nonterminal is replaced at each derivations step. Grammar: E  E + E | E  E | (E) | -E | id Leftmost derivation for the sentence – (id+id) : E  - E  - (E )  - (E + E)  - (id + E )  - ( id + id ) Similarly, for rightmost derivation, the rightmost nonterminal is replaced at each step. E - E ( E ) E+ E id

18 Ambiguous Grammars Each parse tree has a unique leftmost/rightmost derivation (after we obtain the tree). Some sentences may have more than one leftmost or rightmost derivation, therefore, more than one parse tree. id E E + E id E * E id E * E E E + E id  A grammar that produces more than one parse tree for some sentence is said to be ambiguous. Grammar: E  E + E| E*E | id Sentence: id + id * id

Parsing Methods Parser (Syntax Analyzer): Given a token string, generate a parse tree based on the grammar of prog. lang. if the string belongs to the lang. Three Parsing Methods: Universal Parsing: such as Earley’s Algorithm that can parse any grammar. But extremely inefficient. Top-Down Parsing: find a leftmost derivation for an input token string. Bottom-Up Parsing: find a reverse rightmost derivation for an input token string. E E + E id E * E id Top-Down Bottom-Up

Top-Down Parsing Top-down Paring Start at the root of parse tree and try to get to leaves Leftmost derivation Can be efficiently written by hand Only work for certain class of grammars Unamibiguous No left recursion No left factoring Homework 4 is to ask you to implement a parser using top-down parsing E E + E id E * E id Top-Down

Bottom-Up Parsing Bottom-up Parsing Start at leaves and build tree from bottom up Reverse rightmost derivation Basic Method: Shift-reduce Shift symbols onto the stack; reduce when handle is identified by left hand side Use to implement automatic parser generator such as yacc Work for wider class of grammars than top-down Unambiguous Lab: Use yacc E E + E id E * E id Bottom-Up

Part II: Context Free Grammars, Parse Trees and Ambiguity

23 Context-Free Grammar (CFG) A context-free grammar G = (N, T, S, P ) is: (1) N is a finite set of nonterminal symbols (2) T is a finite set of terminal symbols (3) S is the start or initial nonterminal symbol ( S  N). (4) P is a finite set of productions (rules). Every production in P is of the form: A   where A  N and  is a string over (N  T)*. For example: G={ N={S}, T={a,b}, S, P={S  aSb|ab} } S  aSb | ab denotes the language L= {a b | n >0}. n n

24 (c) A CFG for simple arithmetic expressions expr  expr op expr | (expr) | - expr | id op  + | - | * | / |  (d)A CFG for Pascal-like begin-end blocks. block_stmt  BEGIN opt_stmts END opt_stmst  stmt_list |  stmt_list  stmt_list; stmt | stmt stmt  IF (expr) stmt ELSE stmt | block_stmt |  In CFGs for programming languages, Terminal symbols of the grammars are tokens Nonterminal symbols are syntactic variables representing language constructs: statements, expressions, etc. Examples for CFG

25 Derivations A grammar derives strings by: beginning with the start symbol repeatedly replacing a nonterminal by the right hand side of a production for that nonterminal. We say that  A    if 1. A   is a production and 2.  and  are arbitrary strings of grammar symbols We use  to mean “ derives in one step ” ; Example: Grammar:S  a S b | ab Derivations: S  a S b  aabb

26 Example for Derivations Consider the following grammar G E for simple expressions S  ES  E E  E | E + E | E  E | id The string id + id  id can be derived from the start symbol S following the sequence of replacements: S  E  E + E  E + E  E  E + id  E  id + id  E  id + id  id. Each step is a derivation step. At each derivation step, we have to make two choices: Which non-terminal to replace? Using which alternative to replace that non-terminal?

27 Derivations Often we wish to say “ derives in zero or more steps. ” For this purpose we can use the symbol  * 1.   *  for any string , and 2.if   *  and   , then   *  Likewise, we use the symbol  + to mean “ derives in one or more steps. ” For grammar G, If S  * , where S is the start symbol and  may contain nonterminals, then we say that  is a sentential form of G. A sentence of the language defined by G is a sentential form with no nonterminals. A string x is a sentence of L G iff S  + x.

28 The derivation of a sentence can be shown pictorially by a parse tree. each node is labeled by a grammar symbol an interior node and its children correspond to a production. Parse Trees Example: Grammar: S  a S b | ab Derivation: S  a S b  aabb Parse tree: S a S b ab

29 The Properties for Parse tree A parse tree has the following properties: The root is labeled by the start symbol; A leaf node is labeled by a terminal symbol; An interior node is labeled by a nonterminal symbol; If A is the nonterminal labeling some interior node and X 1, X 2,..., X n are the labels of the children of that node from left to right, then A  X 1 X 2... X n is a production. E E + E id E * E id visiting all leaves of a parse tree from left to right, you will trace the sentence formed by the parse tree.

30 Parse tree Sketch of a Parse Tree for a Complete Program

31 Ambiguous Grammars Each parse tree has a unique leftmost/rightmost derivation (after we obtain the tree). Some sentences may have more than one leftmost or rightmost derivation, therefore, more than one parse tree. id E E + E id E * E id E * E E E + E id  A grammar that produces more than one parse tree for some sentence is said to be ambiguous. Grammar: E  E + E| E*E | id Sentence: id + id * id

32 Eliminating ambiguity Not always possible, why? first, no algorithm exists which take an arbitrary grammar and determine, with certainty and in a finite amount of time, whether it is ambiguous or not. second, some grammars are inherently ambiguous, i.e., cannot be made unambiguous So what can we do? We may try to use certain ambiguous grammars, together with disambiguating rules that "throw away" undesirable parse trees For expression, use precedence and associativity.

33 Eliminating ambiguity Consider the ambiguity in the following “ dangling-else ” grammar G: stmt  if expr then stmt | if expr then stmt else stmt | other G is ambiguous because the string: “ if E 1 then if E 2 then S 1 else S 2 “ stmt if expr thenstmt if expr E1 E2 thenstmtelse stmt S1S2 stmt if expr thenstmt if expr E1 E2 thenstmt elsestmt S1 S2

34 Eliminating ambiguity The general rule for “ Dangling-else ” grammar: “ Match each else with the closest previous unmatched then Idea: A statement between a then and a else must be matched. stmt  matched_stmt | unmached_stmt matched_stmt  if expr then matched_stmt else matched_stmt | other unmatched_stmt  if expr then stmt | if expr then matched_stmt else unmatched_stmt stmt if expr thenstmt if expr E1 E2 thenstmtelse stmt S1S2 stmt if expr thenstmt if expr E1 E2 thenstmt elsestmt S1 S2

Ambiguous Grammar Grammar: E  E+E | E*E | id Consider the string id * id + id * id Can have 3 different parse trees: E E+E E*EE*E id E E*E E+E E*E E E*E E*E E+E

Specifying Precedence Idea: Build precedence and associativity into grammar Different non-terminal for different precedence level Lowest level – highest in tree (lowest precedence) Highest level – lower in tree (highest precedence) Same level – same precedence Consider Associativity: left recursion – left associative right recursion – right associative E  E + T | E – T | T T  T * F | T/F | F F  P | P ^ F P  ID | NUM | ( E )

Example E  E + T | E – T | T T  T * F | T/F | F F  P | P ^ F P  ID | NUM | ( E ) ^5^6 E E + T E + T E + T T F P NUM (1) F P NUM (2) F P NUM (3) F P ^ F NUM (4) P ^ F NUM (5) NUM (6)

Summary Introduction to syntax analysis Input (Tokens) and Output (Parse Tree) Specify syntax - Context Free Grammar (CFG) Parsing methods Context-free grammar CFG Derivation, parse tree Ambiguous grammars