PART I: overview material

Slides:



Advertisements
Similar presentations
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Advertisements

Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing.
ISBN Chapter 3 Describing Syntax and Semantics.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Context-Free Grammars Lecture 7
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
CS 536 Spring Ambiguity Lecture 8. CS 536 Spring Announcement Reading Assignment –“Context-Free Grammars” (Sections 4.1, 4.2) Programming.
Chapter 3: Formal Translation Models
Specifying Languages CS 480/680 – Comparative Languages.
COP4020 Programming Languages
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
EECS 6083 Intro to Parsing Context Free Grammars
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Context-Free Grammars
Context-Free Grammars and Parsing
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Grammars CPSC 5135.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Bernd Fischer RW713: Compiler and Software Language Engineering.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
Chapter 3 Describing Syntax and Semantics
Context Free Grammars CFGs –Add recursion to regular expressions Nested constructions –Notation expression  identifier | number | - expression | ( expression.
ISBN Chapter 3 Describing Syntax and Semantics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Top-Down Parsing.
Syntax Analyzer (Parser)
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.
CSE 311 Foundations of Computing I Lecture 19 Recursive Definitions: Context-Free Grammars and Languages Autumn 2012 CSE
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
CS510 Compiler Lecture 4.
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compiler Construction
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSC 4181Compiler Construction Context-Free Grammars
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSC 4181 Compiler Construction Context-Free Grammars
COMPILER CONSTRUCTION
Presentation transcript:

PART I: overview material Course Overview PART I: overview material 1 Introduction 2 Language processors (tombstone diagrams, bootstrapping) 3 Architecture of a compiler PART II: inside a compiler 4 Syntax analysis 5 Contextual analysis 6 Runtime organization 7 Code generation PART III: conclusion Interpretation 9 Review Supplementary material: Theoretical foundations (Context-free grammars)

Syntactic Analysis (a peek ahead at Chapter 4) Sub–phase Input Theoretical tools Output Scanner String of characters Regular expression, Finite–state machine Sequence of tokens Parser Context–free grammar, BNF, EBNF Syntax tree or parse tree

Example The program: x * y + z Input to parser: Output of parser: ID TIMES ID PLUS ID we’ll write tokens as follows: id * id + id Output of parser: a parse tree E E + E E * E id id id

What must parser do? Recognizer: not all sequences of tokens are programs must distinguish between valid and invalid strings of tokens Translator: must expose program structure e.g., associativity and precedence hence must return the syntax tree We need: A language for describing valid sequences of tokens context-free grammars (analogous to regular expressions in the scanner) A method for distinguishing valid from invalid strings of tokens (and for building the syntax tree) the parser (analogous to the finite state machine in the scanner)

Context-free grammars (CFGs) Example: Simple Arithmetic Expressions In English: An integer is an arithmetic expression. If exp1 and exp2 are arithmetic expressions, then so are the following: exp1 - exp2 exp1 / exp2 ( exp1 ) the corresponding CFG: we’ll write tokens as follows: exp  INTLITERAL E  intlit exp  exp MINUS exp E  E - E exp  exp DIVIDE exp E  E / E exp  LPAREN exp RPAREN E  ( E )

Reading the CFG The grammar has five terminal symbols: intlit, -, /, (, ) terminals of a grammar = tokens returned by the scanner. The grammar has one non-terminal symbol: E non-terminals describe valid sequences of tokens The grammar has four productions or rules, each of the form: E   left-hand side = a single non-terminal. right-hand side = either a sequence of one or more terminals and/or non-terminals, or  (the empty string)

Example, revisited Note: a more compact way to write previous grammar: E  intlit | E - E | E / E | ( E ) or E  intlit | E - E | E / E | ( E )

A formal definition of CFGs A CFG consists of A set of terminals T A set of non-terminals N A start symbol S (one of the non-terminals) A set of productions: If not specified, then S is the nonterminal that appears on the left-hand side of the first production.

Notational Conventions In these lecture notes Non-terminals are written in upper-case Terminals are written in lower-case The start symbol is the left-hand side of the first production (unless specified otherwise)

Derivation: Read productions as rules: The Language of a CFG The language defined by a CFG is the set of strings that can be derived from the start symbol of the grammar. Derivation: Read productions as rules: Means can be replaced by

Derivation: key idea Begin with a string consisting of the start symbol “S” Replace any non-terminal X in the string by the right-hand side of some production Repeat (2) until there are no non-terminals in the string

Derivation: an example CFG: E  id E  E + E E  E * E E  ( E ) String id * id + id is in the language defined by the grammar. Derivation:

Terminals Terminals are so called because there are no rules for replacing them Once generated, terminals are permanent Therefore, terminals are the tokens of the language

The Language of a CFG (continued) More formally, we can write if there is a production

The Language of a CFG (continued) Write if using a sequence of 0 or more replacement steps

The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language of G is:

Strings of balanced parentheses The grammar: Example Strings of balanced parentheses The grammar: Which is the same as

Another Example A simple arithmetic expression grammar: Some strings in the language of this grammar:

Derivations and Parse Trees A derivation is a sequence of productions A derivation can be drawn as a tree Start symbol is the tree’s root For a production add children to node

Derivation Example Grammar String

Derivation Example (continued) + E E * E id id id

A syntax tree or parse tree has Notes on Derivations A syntax tree or parse tree has Terminals at the leaves Non-terminals at the interior nodes An in-order traversal of the leaves yields the original input string As in the preceding example, we usually show a left–most derivation, that is, replace the left–most non–terminal remaining at each step

Ambiguity Grammar String

Ambiguity (continued) This string has two parse trees E E E + E E E * E E id id E + E * id id id id

TEST YOURSELF Question 1: Question 2: for each of the two parse trees, find the corresponding left-most derivation Question 2: for each of the two parse trees, find the corresponding right-most derivation

Ambiguity (continued) A grammar is ambiguous if for at least one string: the string has more than one parse tree the string has more than one left-most derivation the string has more than one right-most derivation Note that these three conditions are equivalent Ambiguity is BAD because if the grammar is ambiguous then the meaning of some programs is not well-defined

Dealing with Ambiguity There are several ways to handle ambiguity Most direct method is to rewrite the grammar unambiguously For example, enforce precedence of * and / over + and –

Enforcing Correct Precedence Rewrite the grammar use a different nonterminal for each precedence level start with the lowest precedence (MINUS) E  E - E | E / E | ( E ) | id rewrite to E  E - E | T T  T / T | F F  id | ( E )

parse tree for id – id / id E Example parse tree for id – id / id E  E - E | T T  T / T | F F  id | ( E ) E E E - T T T / T F F F id id id

Draw two parse trees for the expression a-b-c TEST YOURSELF Question 3: Attempt to construct a parse tree for id-id/id that shows the wrong precedence. Why do you fail to construct such a parse tree? Question 4: Draw two parse trees for the expression a-b-c One should correctly group ((a-b)–c), and one should incorrectly group (a–(b-c))

Enforcing Correct Associativity The grammar captures operator precedence, but it is still ambiguous fails to express that both subtraction and division are left associative; 5-3-2 is equivalent to: ((5-3)-2) but not to: (5-(3-2)). 8/4/2 is equivalent to: ((8/4)/2) but not to: (8/(4/2)).

Recursion A grammar is recursive in nonterminal X if: X + … X … the notation + means “after one or more steps, X derives a sequence of symbols that includes another X” A grammar is left recursive in X if: X + X … after one or more steps, X derives a sequence of symbols that starts with an X A grammar is right recursive in X if: X + … X after one or more steps, X derives a sequence of symbols that ends with an X

How to fix associativity The grammar given above is both left and right recursive in non–terminals exp and term try at home: write the derivation steps that show this. To correctly express operator associativity: For left associativity, use only left recursion. For right associativity, use only right recursion. Here's the correct grammar: E  E – T | T T  T / F | F F  id | ( E )

Ambiguity: The Dangling Else Problem Consider the grammar S  if E then S | if E then S else S | a E  b This grammar is ambiguous

The Dangling Else Problem: Example The input string if b then if b then a else a has two different parse trees: S S if if E S E then S else S then b b if E then S a if E then S else S b a b a a

The Dangling Else Problem: How to Fix else should match the closest unmatched then We can enforce this in a grammar: S  M /* all then are matched */ | U /* some then are unmatched */ M  if E then M else M | a U  if E then S | if E then M else U Note: still generates the same set of strings

The Dangling Else Problem: Example Revisited Consider: if b then if b then a else a U U U U if E E E E then M M M M There is now only one possible parse tree for this string Try to draw a different parse tree and you should see why this is true b if E E E E then M else M a a b

Reg Exp are a Subset of CFG We can inductively build a grammar for each Reg Exp:  S   a S  a R1 R2 S  S1 S2 R1 | R2 S  S1 | S2 R1* S  S1 S |  Where: G1 = grammar for R1, with start symbol S1 G2 = grammar for R2, with start symbol S2

BNF is a special syntax for writing a CFG: CFG Backus–Naur Form BNF is a special syntax for writing a CFG: CFG E  E+T | E–T | T T  T*F | T/F | F F  id | (E) BNF <Expr> ::= <Expr> + <Term> | <Expr> – <Term> | <Term> <Term> ::= <Term> * <Fact> | <Term> / <Fact> | <Fact> <Fact> ::= id | ( <Expr> )

EBNF permits any Reg Exp on right side of productions: BNF Extended BNF EBNF permits any Reg Exp on right side of productions: BNF <Expr> ::= <Expr> + <Term> | <Expr> – <Term> | <Term> <Term> ::= <Term> * <Fact> | <Term> / <Fact> | <Fact> <Fact> ::= id | ( <Expr> ) EBNF <Expr> ::= <Term> (“+” <Term> | “–” <Term> )* <Term> ::= <Fact> (“*” <Fact> | “/” <Fact> )* <Fact> ::= id | “(” <Expr> “)”

Question 5: Write a CFG, BNF, and/or EBNF for each of these languages: TEST YOURSELF Question 5: Write a CFG, BNF, and/or EBNF for each of these languages: Strings of the form anbn. Example: aaabbb Strings ambn such that m>n. Example: aaaabb Strings ambn such that m<n. Example: aabbbb Strings over {a, b} such that the number of a’s equals the number of b’s. Example: baabba Strings of the form ambncp. Example: aabbbcccc Strings of the form ambncp such that either m=n or n=p. Examples: aabbcccc, aabbbbcccc