Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.

Slides:



Advertisements
Similar presentations
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Advertisements

Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Mooly Sagiv and Roman Manevich School of Computer Science
ISBN Chapter 3 Describing Syntax and Semantics.
1 Introduction to Computability Theory Lecture5: Context Free Languages Prof. Amos Israeli.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Context-Free Grammars Lecture 7
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #5 Introduction.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
COP4020 Programming Languages
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
EECS 6083 Intro to Parsing Context Free Grammars
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Grammars CPSC 5135.
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
ISBN Chapter 3 Describing Syntax and Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
Context Free Grammars. Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger”
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
作者 : 陳鍾誠 單位 : 金門技術學院資管系 URL : 日期 : 2016/6/4 程式語言的語法 Grammar.
Introduction to Parsing
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
ISBN Chapter 3 Describing Syntax and Semantics.
Context Free Grammars 1. Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger”
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Top-Down Parsing.
Syntax Analyzer (Parser)
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Compiler Design BMZ 1 Chapter4: Syntax Analysis. Compiler Design BMZ 2 Syntax Analysis Source Program Target Program Semantic Analyser Intermediate Code.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
5. Context-Free Grammars and Languages
Chapter 3 – Describing Syntax
CS 404 Introduction to Compiler Design
CS510 Compiler Lecture 4.
Fall Compiler Principles Context-free Grammars Refresher
Introduction to Parsing (adapted from CS 164 at Berkeley)
Lexical Analysis & Syntactic Analysis
5. Context-Free Grammars and Languages
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
Theory of Computation Lecture #
Fall Compiler Principles Context-free Grammars Refresher
Presentation transcript:

Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006

- 1 - Announcements/Reading v Reading - Ch 4 »4.1 – 4.3 for today v Simon does not have office hours on Fri »Just Tue/Thu, 1:30-3:30 »My mistake on the website

- 2 - Where is Syntax Analysis Performed? Lexical Analysis or Scanner if (b == 0) a = b; if(b==0)a=b; Syntax Analysis or Parsing if === b0ab abstract syntax tree or parse tree

- 3 - Parsing Analogy sentence subjectverbindirect objectobject Igavehimnoun phrase articlenoun bookthe “I gave him the book” Syntax analysis for natural languages Recognize whether a sentence is grammatically correct Identify the function of each word

- 4 - Syntax Analysis Overview v Goal – Determine if the input token stream satisfies the syntax of the program v What do we need to do this? »An expressive way to describe the syntax »A mechanism that determines if the input token stream satisfies the syntax description v For lexical analysis »Regular expressions describe tokens »Finite automata = mechanisms to generate tokens from input stream

- 5 - Just Use Regular Expressions? v REs can expressively describe tokens »Easy to implement via DFAs v So just use them to describe the syntax of a programming language?? »NO! – They don’t have enough power to express any non-trivial syntax »Example – Nested constructs (blocks, expressions, statements) – Detect balanced braces: {{} {} {{} { }}} { {{{{ }}}}}... - We need unbounded counting! - FSAs cannot count except in a strictly modulo fashion

- 6 - Context-Free Grammars v Consist of 4 components: »Terminal symbols = token or  »Non-terminal symbols = syntactic variables »Start symbol S = special non-terminal »Productions of the form LHS  RHS  LHS = single non-terminal  RHS = string of terminals and non-terminals  Specify how non-terminals may be expanded v Language generated by a grammar is the set of strings of terminals derived from the start symbol by repeatedly applying the productions »L(G) = language generated by grammar G S  a S a S  T T  b T b T  

- 7 - CFG - Example v Grammar for balanced-parentheses language »S  ( S ) S »S    1 non-terminal: S  2 terminals: “(”, “)”  Start symbol: S  2 productions v If grammar accepts a string, there is a derivation of that string using the productions »“(())” »S = (S)  = ((S) S)  = ((  )  )  = (()) ? Why is the final S required?

- 8 - More on CFGs v Shorthand notation – vertical bar for multiple productions »S  a S a | T »T  b T b |  v CFGs powerful enough to expression the syntax in most programming languages v Derivation = successive application of productions starting from S v Acceptance? = Determine if there is a derivation for an input token stream

- 9 - A Parser Parser Context free grammar, G Token stream, s (from lexer) Yes, if s in L(G) No, otherwise Error messages Syntax analyzers (parsers) = CFG acceptors which also output the corresponding derivation when the token stream is accepted Various kinds: LL(k), LR(k), SLR, LALR

RE is a Subset of CFG Can inductively build a grammar for each RE  S   aS  a R1 R2S  S1 S2 R1 | R2S  S1 | S2 R1*S  S1 S |  Where G1 = grammar for R1, with start symbol S1 G2 = grammar for R2, with start symbol S2

Grammar for Sum Expression v Grammar »S  E + S | E »E  number | (S) v Expanded »S  E + S »S  E »E  number »E  (S) 4 productions 2 non-terminals (S,E) 4 terminals: “(“, “)”, “+”, number start symbol: S

Constructing a Derivation v Start from S (the start symbol) v Use productions to derive a sequence of tokens v For arbitrary strings α, β, γ and for a production: A  β »A single step of the derivation is »α A γ α β γ (substitute β for A) v Example »S  E + S »(S + E) + E  (E + S + E) + E

Class Problem »S  E + S | E »E  number | (S) v Derive: ( (3 + 4)) + 5

Parse Tree S E+S ( S )E E + S 5 1 2E ( S ) E + S E3 4 Parse tree = tree representation of the derivation Leaves of the tree are terminals Internal nodes are non-terminals No information about the order of the derivation steps

Parse Tree vs Abstract Syntax Tree S E+S ( S )E E + S 5 1 2E ( S ) E + S E Parse tree also called “concrete syntax” AST discards (abstracts) unneeded information – more compact format

Derivation Order v Can choose to apply productions in any order, select non-terminal and substitute RHS of production v Two standard orders: left and right-most v Leftmost derivation »In the string, find the leftmost non-terminal and apply a production to it »E + S  1 + S v Rightmost derivation »Same, but find rightmost non-terminal »E + S  E + E + S

Leftmost/Rightmost Derivation Examples » S  E + S | E » E  number | (S) » Leftmost derive: ( (3 + 4)) + 5 S  E + S  (S)+S  (E+S) + S  (1+S)+S  (1+E+S)+S  (1+2+S)+S  (1+2+E)+S  (1+2+(S))+S  (1+2+(E+S))+S  (1+2+(3+S))+S  (1+2+(3+E))+S  (1+2+(3+4))+S  (1+2+(3+4))+E  (1+2+(3+4))+5 »Now, rightmost derive the same input string Result: Same parse tree: same productions chosen, but in diff order S  E+S  E+E  E+5  (S)+5  (E+S)+5  (E+E+S)+5  (E+E+E)+5  (E+E+(S))+5  (E+E+(E+S))+5  (E+E+(E+E))+5  (E+E+(E+4))+5  (E+E+(3+4))+5  (E+2+(3+4))+5  (1+2+(3+4))+5

Class Problem »S  E + S | E »E  number | (S) | -S v Do the rightmost derivation of : 1 + (2 + -(3 + 4)) + 5

Ambiguous Grammars v In the sum expression grammar, leftmost and rightmost derivations produced identical parse trees v + operator associates to the right in parse tree regardless of derivation order (1+2+(3+4))

An Ambiguous Grammar v + associates to the right because of the right-recursive production: S  E + S v Consider another grammar »S  S + S | S * S | number v Ambiguous grammar = different derivations produce different parse trees »More specifically, G is ambiguous if there are 2 distinct leftmost (rightmost) derivations for some sentence

Ambiguous Grammar - Example S  S + S | S * S | number Consider the expression: * 3 Derivation 1: S  S+S  1+S  1+S*S  1+2*S  1+2*3 Derivation 2: S  S*S  S+S*S  1+S*S  1+2*S  1+2*3 + *1 23 * But, obviously not equal! 2 leftmost derivations

Impact of Ambiguity v Different parse trees correspond to different evaluations! v Thus, program meaning is not defined!! + *1 23 * = 7 = 9

Can We Get Rid of Ambiguity? v Ambiguity is a function of the grammar, not the language! v A context-free language L is inherently ambiguous if all grammars for L are ambiguous v Every deterministic CFL has an unambiguous grammar »So, no deterministic CFL is inherently ambiguous »No inherently ambiguous programming languages have been invented v To construct a useful parser, must devise an unambiguous grammar

Eliminating Ambiguity v Often can eliminate ambiguity by adding nonterminals and allowing recursion only on right or left »S  S + T | T »T  T * num | num »T non-terminal enforces precedence »Left-recursion; left associativity S S + T TT * 3 12

A Closer Look at Eliminating Ambiguity v Precedence enforced by »Introduce distinct non-terminals for each precedence level »Operators for a given precedence level are specified as RHS for the production »Higher precedence operators are accessed by referencing the next-higher precedence non- terminal

Associativity v An operator is either left, right or non associative »Left:a + b + c = (a + b) + c »Right:a ^ b ^ c = a ^ (b ^ c) »Non:a < b < c is illegal (thus undefined) v Position of the recursion relative to the operator dictates the associativity »Left (right) recursion  left (right) associativity »Non: Don’t be recursive, simply reference next higher precedence non-terminal on both sides of operator

Class Problem S  S + S | S – S | S * S | S / S | (S) | -S | S ^ S | num Enforce the standard arithmetic precedence rules and remove all ambiguity from the above grammar Precedence (high to low) (), unary – ^ *, / +, - Associativity ^ = right rest are left