BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.

Slides:



Advertisements
Similar presentations
lec02-parserCFG March 27, 2017 Syntax Analyzer
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
LESSON 18.
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
LR(1) Languages An Introduction Professor Yihjia Tsai Tamkang University.
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
COP4020 Programming Languages
Parsing II : Top-down Parsing Lecture 7 CS 4318/5331 Apan Qasem Texas State University Spring 2015 *some slides adopted from Cooper and Torczon.
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
8/19/2015© Hal Perkins & UW CSEC-1 CSE P 501 – Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Review: –How do we define a grammar (what are the components in a grammar)? –What is a context free grammar? –What is the language defined by a grammar?
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
CS 321 Programming Languages and Compilers VI. Parsing.
Context-Free Grammars
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring CSE3302 Programming Languages,
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 Chapter 4 Grammars and Parsing. 2 Context-Free Grammars: Concepts and Notation A context-free grammar G = (Vt, Vn, S, P) –A finite terminal vocabulary.
6/4/2016IT 3271 The most practical Parsers: Predictive parser: 1.input (token string) 2.Stacks, parsing table 3.output (syntax tree, intermediate codes)
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Bernd Fischer RW713: Compiler and Software Language Engineering.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
Lecture # 10 Grammar Problems. Problems with grammar Ambiguity Left Recursion Left Factoring Removal of Useless Symbols These can create problems for.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
lec02-parserCFG May 8, 2018 Syntax Analyzer
Parsing & Context-Free Grammars
Chapter 4 - Parsing CSCE 343.
Programming Languages Translator
CS510 Compiler Lecture 4.
Syntax Specification and Analysis
lec02-parserCFG July 30, 2018 Syntax Analyzer
Compiler Construction
Parsing & Context-Free Grammars Hal Perkins Autumn 2011
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Lecture 7: Introduction to Parsing (Syntax Analysis)
Bottom Up Parsing.
R.Rajkumar Asst.Professor CSE
BNF 9-Apr-19.
lec02-parserCFG May 27, 2019 Syntax Analyzer
Presentation transcript:

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This syntactic structure is mostly a parse tree. Syntax Analyzer is also known as parser. The syntax of a programming is described by a context-free grammar (CFG). We will use BNF (Backus-Naur Form) notation in the description of CFGs. The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a context-free grammar or not. –If it satisfies, the parser creates the parse tree of that program. –Otherwise the parser gives the error messages. A context-free grammar –gives a precise syntactic specification of a programming language. –the design of the grammar is an initial phase of the design of a compiler. –a grammar can be directly converted into a parser by some tools.

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)2 Parser Lexical Analyzer Parser source program token get next token parse tree Parser works on a stream of tokens. The smallest item is a token.

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)3 Parsers (cont.) We categorize the parsers into two groups: 1.Top-Down Parser –the parse tree is created top to bottom, starting from the root. 2.Bottom-Up Parser –the parse is created bottom to top; starting from the leaves Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time). Efficient top-down and bottom-up parsers can be implemented only for sub-classes of context-free grammars. –LL for top-down parsing –LR for bottom-up parsing

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)4 Context-Free Grammars Inherently recursive structures of a programming language are defined by a context-free grammar. In a context-free grammar, we have: –A finite set of terminals (in our case, this will be the set of tokens) –A finite set of non-terminals (syntactic-variables) –A finite set of productions rules in the following form A   where A is a non-terminal and  is a string of terminals and non-terminals (including the empty string) –A start symbol (one of the non-terminal symbol) Example: E  E + E | E – E | E * E | E / E | - E E  ( E ) E  id

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)5 Derivations E  E+E E+E derives from E –we can replace E by E+E –to able to do this, we have to have a production rule E  E+E in our grammar. E  E+E  id+E  id+id A sequence of replacements of non-terminal symbols is called a derivation of id+id from E. In general a derivation step is  A    if there is a production rule A  in our grammar where  and  are arbitrary strings of terminal and non-terminal symbols  1   2 ...   n (  n derives from  1 or  1 derives  n )  : derives in one step  : derives in zero or more steps  : derives in one or more steps * +

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)6 CFG - Terminology L(G) is the language of G (the language generated by G) which is a set of sentences. A sentence of L(G) is a string of terminal symbols of G. If S is the start symbol of G then  is a sentence of L(G) iff S   where  is a string of terminals of G. If G is a context-free grammar, L(G) is a context-free language. Two grammars are equivalent if they produce the same language. S   - If  contains non-terminals, it is called as a sentential form of G. - If  does not contain non-terminals, it is called as a sentence of G. + *

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)7 Derivation Example E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) OR E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement. If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation. If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation.

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)8 Left-Most and Right-Most Derivations Left-Most Derivation E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) Right-Most Derivation E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) We will see that the top-down parsers try to find the left-most derivation of the given source program. We will see that the bottom-up parsers try to find the right-most derivation of the given source program in the reverse order. lm rm

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)9 Parse Tree Inner nodes of a parse tree are non-terminal symbols. The leaves of a parse tree are terminal symbols. A parse tree can be seen as a graphical representation of a derivation. E  -E E E- E E EE E + - () E E E- () E E id E E E+ - () E E E EE+ - ()  -(E)  -(E+E)  -(id+E)  -(id+id)

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)10 Ambiguity A grammar produces more than one parse tree for a sentence is called as an ambiguous grammar. E  E+E  id+E  id+E*E  id+id*E  id+id*id E  E*E  E+E*E  id+E*E  id+id*E  id+id*id E id E+ E E * E E E+ E E * E

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)11 Ambiguity (cont.) For the most parsers, the grammar must be unambiguous. unambiguous grammar  unique selection of the parse tree for a sentence We should eliminate the ambiguity in the grammar during the design phase of the compiler. An unambiguous grammar should be written to eliminate the ambiguity. We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to disambiguate that grammar to restrict to this choice.

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)12 Ambiguity (cont.) stmt  if expr then stmt | if expr then stmt else stmt | otherstmts if E 1 then if E 2 then S 1 else S 2 stmt if expr then stmt else stmt E 1 if expr then stmt S 2 E 2 S 1 stmt if expr then stmt E 1 if expr then stmt else stmt E 2 S 1 S 2 1 2

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)13 Ambiguity (cont.) We prefer the second parse tree (else matches with closest if). So, we have to disambiguate our grammar to reflect this choice. The unambiguous grammar will be: stmt  matchedstmt | unmatchedstmt matchedstmt  if expr then matchedstmt else matchedstmt | otherstmts unmatchedstmt  if expr then stmt | if expr then matchedstmt else unmatchedstmt

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)14 Ambiguity – Operator Precedence Ambiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence and associativity rules. E  E+E | E*E | E^E | id | (E) disambiguate the grammar precedence: ^ (right to left) * (left to right) + (left to right) E  E+T | T T  T*F | F F  G^F | G G  id | (E) 

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)15 Left Recursion A grammar is left recursive if it has a non-terminal A such that there is a derivation. A  A  for some string  Top-down parsing techniques cannot handle left-recursive grammars. So, we have to convert our left-recursive grammar into an equivalent grammar which is not left-recursive. The left-recursion may appear in a single step of the derivation (immediate left-recursion), or may appear in more than one step of the derivation. +

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)16 Immediate Left-Recursion A  A  |  where  does not start with A  eliminate immediate left recursion A   A ’ A ’   A ’ |  an equivalent grammar A  A  1 |... | A  m |  1 |... |  n where  1...  n do not start with A  eliminate immediate left recursion A   1 A ’ |... |  n A ’ A ’   1 A ’ |... |  m A ’ |  an equivalent grammar In general,

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)17 Immediate Left-Recursion -- Example E  E+T | T T  T*F | F F  id | (E) E  T E ’ E ’  +T E ’ |  T  F T ’ T ’  *F T ’ |  F  id | (E)  eliminate immediate left recursion

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)18 Left-Recursion -- Problem A grammar cannot be immediately left-recursive, but it still can be left-recursive. By just eliminating the immediate left-recursion, we may not get a grammar which is not left-recursive. S  Aa | b A  Sc | dThis grammar is not immediately left-recursive, but it is still left-recursive. S  Aa  Sca or A  Sc  Aac causes to a left-recursion So, we have to eliminate all left-recursions from our grammar

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)19 Eliminate Left-Recursion -- Algorithm - Arrange non-terminals in some order: A 1... A n - for i from 1 to n do { - for j from 1 to i-1 do { replace each production A i  A j  by A i   1  |... |  k  where A j   1 |... |  k } - eliminate immediate left-recursions among A i productions }

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)20 Eliminate Left-Recursion -- Example S  Aa | b A  Ac | Sd | f - Order of non-terminals: S, A for S: - we do not enter the inner loop. - there is no immediate left recursion in S. for A: - Replace A  Sd with A  Aad | bd So, we will have A  Ac | Aad | bd | f - Eliminate the immediate left-recursion in A A  bdA ’ | fA ’ A ’  cA ’ | adA ’ |  So, the resulting equivalent grammar which is not left-recursive is: S  Aa | b A  bdA ’ | fA ’ A ’  cA ’ | adA ’ | 

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)21 Eliminate Left-Recursion – Example2 S  Aa | b A  Ac | Sd | f - Order of non-terminals: A, S for A: - we do not enter the inner loop. - Eliminate the immediate left-recursion in A A  SdA ’ | fA ’ A ’  cA ’ |  for S: - Replace S  Aa with S  SdA ’ a | fA ’ a So, we will have S  SdA ’ a | fA ’ a | b - Eliminate the immediate left-recursion in S S  fA’aS’ | bS ’ S ’  dA ’ aS’ |  So, the resulting equivalent grammar which is not left-recursive is: S  fA’aS’ | bS ’ S ’  dA ’ aS’ |  A  SdA ’ | fA ’ A ’  cA ’ | 

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)22 Left-Factoring A predictive parser (a top-down parser without backtracking) insists that the grammar must be left-factored. grammar  a new equivalent grammar suitable for predictive parsing stmt  if expr then stmt else stmt | if expr then stmt when we see if, we cannot now which production rule to choose to re-write stmt in the derivation.

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)23 Left-Factoring (cont.) In general, A   1 |  2 where  is non-empty and the first symbols of  1 and  2 (if they have one)are different. when processing  we cannot know whether expand A to  1 or A to  2 But, if we re-write the grammar as follows A   A ’ A’   1 |  2 so, we can immediately expand A to  A ’

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)24 Left-Factoring -- Algorithm For each non-terminal A with two or more alternatives (production rules) with a common non-empty prefix, let say A   1 |... |  n |  1 |... |  m convert it into A   A ’ |  1 |... |  m A ’   1 |... |  n

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)25 Left-Factoring – Example1 A  abB | aB | cdg | cdeB | cdfB  A  aA ’ | cdg | cdeB | cdfB A ’  bB | B  A  aA ’ | cdA ’’ A ’  bB | B A ’’  g | eB | fB

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)26 Left-Factoring – Example2 A  ad | a | ab | abc | b  A  aA’ | b A’  d |  | b | bc  A  aA’ | b A’  d |  | bA’’ A’’   | c

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)27 Non-Context Free Language Constructs There are some language constructions in the programming languages which are not context-free. This means that, we cannot write a context- free grammar for these constructions. L1 = {  c  |  is in (a|b)*}is not context-free  declaring an identifier and checking whether it is declared or not later. We cannot do this with a context-free language. We need semantic analyzer (which is not context-free). L2 = {a n b m c n d m | n  1 and m  1 }is not context-free  declaring two functions (one with n parameters, the other one with m parameters), and then calling them with actual parameters.