CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.

Slides:



Advertisements
Similar presentations
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
ISBN Chapter 3 Describing Syntax and Semantics.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.
Context-Free Grammars Lecture 7
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #5 Introduction.
COP4020 Programming Languages
Winter 2003/4Pls – syntax – Catriel Beeri1 SYNTAX Syntax: form, structure The syntax of a pl: The set of its well-formed programs The rules that define.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Syntax and Semantics Structure of programming languages.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
CS 461 – Oct. 7 Applications of CFLs: Compiling Scanning vs. parsing Expression grammars –Associativity –Precedence Programming language (handout)
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Syntax and Semantics Structure of programming languages.
作者 : 陳鍾誠 單位 : 金門技術學院資管系 URL : 日期 : 2016/6/4 程式語言的語法 Grammar.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Compiler Design BMZ 1 Chapter4: Syntax Analysis. Compiler Design BMZ 2 Syntax Analysis Source Program Target Program Semantic Analyser Intermediate Code.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Chapter 3 – Describing Syntax
lec02-parserCFG May 8, 2018 Syntax Analyzer
A Simple Syntax-Directed Translator
Introduction to Parsing
CS 404 Introduction to Compiler Design
Programming Languages Translator
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Chapter 3 – Describing Syntax
Lexical Analysis & Syntactic Analysis
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
lec02-parserCFG May 27, 2019 Syntax Analyzer
Presentation transcript:

CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 2 Outline Review of lexical analysis Context-Free Grammars (CFGs) Derivations and parse trees Top-down vs. bottom-up parsing Ambiguous grammars

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 3 Administration Group preference handouts due today Must turn in questionnaire to be considered part of class! Handout: Homework 1 - due next Friday

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 4 Where we are Source code (character stream) Lexical analysis Syntactic Analysis Token stream Abstract syntax tree (AST) Semantic Analysis if (b == 0) a = b; if(b)a=b;0== if == b0 = ab if ;

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 5 What is Syntactic Analysis? Source code (token stream) { if (b == (0)) a = b; while (a != 1) { stdio.print(a); a = a - 1; } if_stmt bin_op variable b constant 0 block while_stmt bin_op == !=variableconstant block expr_stmt Abstract Syntax Tree 1acall. stdio print variable a =...

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 6 Overview of Syntactic Analysis Input: stream of tokens Output: abstract syntax tree Implementation: –Parse token stream to traverse concrete syntax –During traversal, build abstract syntax tree

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 7 Parsing Parsing: recognizing whether a program (or sentence) is grammatically well- formed & identifying the function of each component. “I gave him the book” sentence subject: Iverb:gaveobject: him indirect object noun phrase article: the noun: book

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 8 What Parsing doesn’t do Doesn’t check many things: type agreement, variables initialized int x = true; int y; z = f(y); Deferred until semantic analysis

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 9 Specifying Language Syntax First problem: how to describe language syntax precisely and conveniently Last time: can describe tokens using regular expressions Regular expressions easy to implement, efficient (by converting to DFA) Why not use regular expressions (on tokens) to specify programming language syntax?

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 10 Limits of REs Programming languages are not regular -- cannot be described by regular exprs Consider: language of all strings that contain balanced parentheses (easier than PLs) () (()) ()()() (())()((()())) (( )( ()) (()() Problem: need to keep track of number of parentheses seen so far: unbounded counting

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 11 Need more power! RE = DFA DFA has only finite number of states; cannot perform unbounded counting ( ( (( ( ))))) maximum depth: 5 parens

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 12 Context-Free Grammars A specification of the balanced- parenthesis language: S  ( S ) S S   The definition is recursive This is a context-free grammar –More expressive than regular expressions –S = (S)  = ((S) S)  = ((  )  )  = (())

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 13 RE is subset of CFG Regular Expressions digit  [0-9] posint  digit+ int  -? posint real  int (ε | (. posint)) Symbolic names are shorthand All symbol names can be fully expanded: not recursive real  -? [0-9]+ (ε | (. [0-9]+ ))

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 14 Definition of CFG Terminals –Symbols for strings or tokens (also ε) Non-terminals –Syntactic variables Start symbol –A special nonterminal is designated Production –Specifies how non-terminals may be expanded to form strings –LHS: single non-terminal, RHS: string of terminals and non-terminals S  ( S ) S | 

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 15 Sum grammar S  S + E | E E  number | ( S ) accepts (1+2+ (3+4))+5 S  S + E S  E E  number E  (S) 4 productions 2 non-terminals (S, E) 4 terminals: (, ), +, number start symbol S

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 16 Derivations S  S + E | E E  number | ( S ) If a grammar accepts a string, there is a derivation of that string using the productions of the grammar

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 17 Derivation Example S  S + E | E E  number | ( S ) Derive (1+2+ (3+4))+5: S  S + E  E + E  (S )+E  (S + E)+E  (S + E + E)+E  (E + E + E)+E  (1 + E + E)+E  ( E)+E  …  ( (3+4))+E  ( (3+4))+5 replacement string non-terminal being expanded

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 18 Constructing a derivation Start from start symbol (S) Productions are used to derive a sequence of tokens from the start symbol For arbitrary strings ,  and  and a production A   A single step of derivation is  A    –i.e., substitute  for an occurrence of A –(S + E) + E  (S + E + E)+E A = S,  = S + E

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 19 Derivation -> Parse Tree S  S+E  E+E  (S)+E  (S+E)+E  (S+E+E)+E  (E+E+E)+E  (1+E+E)+E  (1+2+E)+E  …  (1+2+(3+4))+E  (1+2+(3+4))+5 S S + E E ( S ) 5 S + E ( S ) S + E E E Tree representation of the derivation Leaves of tree are terminals; in-order traversal yields string Internal nodes: non-terminals Loses information about order of derivation steps Parse Tree Derivation

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 20 Parse Tree Also called “concrete syntax” S S + E E ( S ) 5 S + E ( S ) S + E E E parse tree/ concrete syntax abstract syntax tree (Contains information needed to evaluate program)

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 21 Derivation order Can choose to apply productions in any order; select any non-terminal A  A    Two standard orders: left- and right-most -- useful for automatic parsing Leftmost derivation –In the string, find the left-most non- terminal and apply a production to it

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 22 Example Left-most derivation S  S+E  E+E  (S)+E  (S+E)+E  (S+E+E)+E  (E+E+E)+E  (1+E+E)+E  (1+2+E)+E  (1+2+(S))+E  (1+2+(S+E))+E  (1+2+(E+E))+E  (1+2+(3+E))+E  (1+2+(3+4))+E  (1+2+(3+4))+5 Right-most derivation S  S+E  E+5  (S)+5  (S+E)+5  (S+(S))+5  (S+(S+E))+5  (S+(S+4))+5  (S+(E+4))+5  (S+(3+4))+5  (S+E+(3+4))+5  (S+2+(3+4))+5  (E+2+(3+4))+5  (1+2+(3+4))+5 Same parse tree: same productions chosen, diff. order S  S + E | E E  number | ( S )

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 23 Top-down vs. Bottom-up Parsing We normally scan in tokens from left to right Left-most derivation reflects top-down parsing –Start with the start symbol –End with the string of tokens S  S+E  E+E  (S)+E  (S+E)+E  (S+E+E)+E  (E+E+E)+E  (1+E+E)+E  (1+2+E)+E  (1+2+(S))+E  (1+2+(S+E))+E  (1+2+(E+E))+E  (1+2+(3+E))+E  (1+2+(3+4))+E  (1+2+(3+4))+5

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 24 Top-down parsing S  S+E  E+E  (S)+E  (S+E)+E  (S+E+E)+E  (E+E+E)+E  (1+E+E)+E  (1+2+E)+E... Entire tree above a token (2) has been expanded when encountered S S + E E ( S ) 5 S + E ( S ) S + E E E

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 25 Bottom-up Parsing Right-most derivation reflects bottom-up parsing –Start with the tokens –End with the start symbol (1+2+(3+4))+5  (E+2+(3+4))+5  (S+2+(3+4))+5  (S+E+(3+4))+5  (S+(3+4))+5  (S+(E+4))+5  (S+(S+4))+5  (S+(S+E))+5  (S+(S))+5  (S+E)+5  (S)+5  E+5  S+E  S

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 26 Bottom-up parsing (1+2+(3+4))+5  (E+2+(3+4))+5  (S+2+(3+4))+5  (S+E+(3+4))+5 … Advantage of bottom-up parsing: can select productions based on more information S S + E E ( S ) 5 S + E S + ES + E ( S ) S + E E E

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 27 Ambiguous Grammars In example grammar, left-most and right-most derivations produced identical parse trees + operator associates to left in parse tree regardless of derivation order (1+2+(3+4))+5

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 28 An Ambiguous Grammar + associates to left because of production S  S + E Consider another grammar: S  S + S | S * S | number Different derivations produce different parse trees: ambiguous grammar

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 29 Differing Parse Trees Consider expression * 3 Derivation 1: S  S + S  1 + S  1 + S * S  * S  * 3 Derivation 2: S  S * S  S * 3  S + S * 3  S + 2 * 3  * 3 S  S + S | S * S | number 123 * * 1 2 +

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 30 Impact of Ambiguity 123 * = 7 = 9 + Different parse trees correspond to different evaluations! Meaning of program ill-defined *

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 31 Eliminating Ambiguity Often can eliminate ambiguity by adding non-terminals S  S + T | T T  T * num | num Enforces precedence, left-associativity S S + T T * 3 T 1 2

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers 32 Summary Context-free grammars allow concise specification of programming languages More powerful than regular expressions CFG converts token stream to parse tree Read Appel 3.1, 3.2 Next time: implementing a top-down parser