Introduction to Parsing (adapted from CS 164 at Berkeley)

Slides:



Advertisements
Similar presentations
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Advertisements

Context-Free Grammars Lecture 7
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
COP4020 Programming Languages
Lecture 14 Syntax-Directed Translation Harry Potter has arrived in China, riding the biggest initial print run for a work of fiction here since the Communist.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
EECS 6083 Intro to Parsing Context Free Grammars
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Top-Down Parsing.
Syntax Analyzer (Parser)
CPSC 388 – Compiler Design and Construction Parsers – Syntax Directed Translation.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Introduction to Parsing
Comp 411 Principles of Programming Languages Lecture 3 Parsing
Chapter 3 – Describing Syntax
Parsing #1 Leonidas Fegaras.
lec02-parserCFG May 8, 2018 Syntax Analyzer
A Simple Syntax-Directed Translator
Introduction to Parsing
Parsing & Context-Free Grammars
Context-Free Grammars: an overview
CS 404 Introduction to Compiler Design
Programming Languages Translator
CS510 Compiler Lecture 4.
Fall Compiler Principles Context-free Grammars Refresher
Chapter 3 Context-Free Grammar and Parsing
Parsing IV Bottom-up Parsing
Compiler Construction
4 (c) parsing.
Parsing & Context-Free Grammars Hal Perkins Autumn 2011
(Slides copied liberally from Ruth Anderson, Hal Perkins and others)
Syntax-Directed Translation
Programming Language Syntax 2
Chapter 2: A Simple One Pass Compiler
Lecture 7: Introduction to Parsing (Syntax Analysis)
Ambiguity, Precedence, Associativity & Top-Down Parsing
CHAPTER 2 Context-Free Languages
CSC 4181Compiler Construction Context-Free Grammars
R.Rajkumar Asst.Professor CSE
Parsing IV Bottom-up Parsing
Introduction to Parsing
Introduction to Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSC 4181 Compiler Construction Context-Free Grammars
Fall Compiler Principles Context-free Grammars Refresher
Parsing & Context-Free Grammars Hal Perkins Summer 2004
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
lec02-parserCFG May 27, 2019 Syntax Analyzer
Parsing & Context-Free Grammars Hal Perkins Autumn 2005
COMPILER CONSTRUCTION
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 2, 09/04/2003 Prof. Roy Levow.
Faculty of Computer Science and Information System
Presentation transcript:

Introduction to Parsing (adapted from CS 164 at Berkeley)

Outline Parser overview Context-free grammars (CFG’s) Derivations Syntax-Directed Translation

The Functionality of the Parser Input: sequence of tokens from lexer Output: abstract syntax tree of the program One-pass compiler: directly generate assembly code This is what you will do in the first assignment Bali  SaM code

Parser input: IF ID == ID : ID = INT ELSE : ID = INT Example Pyth: if x == y: z =1 else: z = 2 Parser input: IF ID == ID : ID = INT ELSE : ID = INT Parser output (abstract syntax tree): IF-THEN-ELSE == = ID INT

Each stage of the compiler has two purposes: Why A Tree? Each stage of the compiler has two purposes: Detect and filter out some class of errors Compute some new information or translate the representation of the program to make things easier for later stages Recursive structure of tree suits recursive structure of language definition With tree, later stages can easily find “the else clause”, e.g., rather than having to scan through tokens to find it.

Notation for Programming Languages Grammars: E  int E  E + E E  E * E E  ( E ) We can view these rules as rewrite rules We start with E and replace occurrences of E with some right-hand side E  E * E  ( E ) * E  ( E + E ) * E  …  (int + int) * int

Context-Free Grammars A CFG consists of A set of non-terminals N By convention, written with capital letter in these notes A set of terminals T By convention, either lower case names or punctuation A start symbol S (a non-terminal) A set of productions Assuming E  N E  e , or E  Y1 Y2 ... Yn where Yi  N  T

Simple arithmetic expressions: Examples of CFGs Simple arithmetic expressions: E  int E  E + E E  E * E E  ( E ) One non-terminal: E Several terminals: int, +, *, (, ) Called terminals because they are never replaced By convention the non-terminal for the first production is the start one

Key Idea Begin with a string consisting of the start symbol Replace any non-terminal X in the string by a right-hand side of some production X  Y1 … Yn Repeat (2) until there are only terminals in the string The successive strings created in this way are called sentential forms.

The Language of a CFG (Cont.) Write X1 … Xn * Y1 … Ym if X1 … Xn  …  …  Y1 … Ym in 0 or more steps

L(G) = { a1 … an | S * a1 … an and every ai The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language of G is: L(G) = { a1 … an | S * a1 … an and every ai is a terminal }

Examples: S  0 also written as S  0 | 1 S  1 Generates the language { “0”, “1” } What about S  1 A A  0 | 1 A  0 | 1 A What about S   | ( S )

Derivations and Parse Trees A derivation is a sequence of sentential forms resulting from the application of a sequence of productions S  …  … Parse tree: summary of derivation w/o specifying completely the order in which rules were applied Start symbol is the tree’s root For a production X  Y1 … Yn add children Y1, …, Yn to node X

Derivation Example Grammar E  E + E | E * E | (E) | int String int * int + int

Derivation in Detail (1)

Derivation in Detail (2)  E + E E + E

Derivation in Detail (3)  E + E  E * E + E E + E E * E

Derivation in Detail (4)  E + E  E * E + E  int * E + E E + E E * E int

Derivation in Detail (5)  E + E  E * E + E  int * E + E  int * int + E E + E E * E int int

Derivation in Detail (6)  E + E  E * E + E  int * E + E  int * int + E  int * int + int E + E E * E int int int

A left-right traversal of the leaves is the original input Notes on Derivations A parse tree has Terminals at the leaves Non-terminals at the interior nodes A left-right traversal of the leaves is the original input The parse tree shows the association of operations, the input string does not ! There may be multiple ways to match the input Derivations (and parse trees) choose one

AST is condensed form of a parse tree AST vs. Parse Tree AST is condensed form of a parse tree operators appear at internal nodes, not at leaves. "Chains" of single productions are collapsed. Lists are "flattened". Syntactic details are omitted e.g., parentheses, commas, semi-colons AST is a better structure for later compiler stages omits details having to do with the source language, only contains information about the essential structure of the program.

Example: 2 * (4 + 5) Parse tree vs. AST F E * ) + ( int (5) int (4) * 2 + 4 5 int (2)

Summary of Derivations We are not just interested in whether s  L(G) Also need derivation (or parse tree) and AST. Parse trees slavishly reflect the grammar. Abstract syntax trees abstract from the grammar, cutting out detail that interferes with later stages. A derivation defines a parse tree But one parse tree may have many derivations Derivations drive translation (to ASTs, etc.) Leftmost and rightmost derivations most important in parser implementation

Ambiguity Grammar E  E + E | E * E | ( E ) | int Strings int + int + int int * int + int

The string int + int + int has two parse trees Ambiguity. Example The string int + int + int has two parse trees E E E + E E E + E E int int E + E + int int int int + is left-associative

The string int * int + int has two parse trees Ambiguity. Example The string int * int + int has two parse trees E E E + E E E * E E int int E + E * int int int int * has higher precedence than +

Ambiguity is common in programming languages Ambiguity (Cont.) A grammar is ambiguous if it has more than one parse tree for some string Equivalently, there is more than one rightmost or leftmost derivation for some string Ambiguity is bad Leaves meaning of some programs ill-defined Ambiguity is common in programming languages Arithmetic expressions IF-THEN-ELSE

Dealing with Ambiguity There are several ways to handle ambiguity Most direct method is to rewrite the grammar unambiguously E  E + T | T T  T * int | int | ( E ) Enforces precedence of * over + Enforces left-associativity of + and *

Ambiguity. Example The int * int + int has only one parse tree now E E E + T E E * T int int E + E int int T * int int

Ambiguity Impossible to convert automatically an ambiguous grammar to an unambiguous one Used with care, ambiguity can simplify the grammar Sometimes allows more natural definitions But we need disambiguation mechanisms Instead of rewriting the grammar Use the more natural (ambiguous) grammar Along with disambiguating declarations Most tools allow precedence and associativity declarations to disambiguate grammars Examples …

Associativity Declarations Consider the grammar E  E + E | int Ambiguous: two parse trees of int + int + int E + int E + int Left-associativity declaration: %left ‘+’

Summary Grammar is specified using a context-free language (CFL) Derivation: starting from start symbol, use grammar rules as rewrite rules to derive input string Leftmost and rightmost derivations Parse trees and abstract syntax trees Ambiguous grammars Ambiguity should be eliminated by modifying grammar, by specifying precedence rules etc. depending on how ambiguity arises in the grammar Remaining question: how do we find the derivation for a given input string?