COP4020 Programming Languages Syntax Prof. Robert van Engelen (modified by Prof. Em. Chris Lacher)

Slides:



Advertisements
Similar presentations
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Advertisements

ISBN Chapter 3 Describing Syntax and Semantics.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax Lecture 2 - Syntax, Spring CSE3302 Programming Languages, UT-Arlington ©Chengkai.
1 Contents Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
COP4020 Programming Languages
Chapter 2 Chang Chi-Chung rev.1. A Simple Syntax-Directed Translator This chapter contains introductory material to Chapters 3 to 8  To create.
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
Lexical and syntax analysis
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
Syntax and Semantics Structure of programming languages.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Review: –How do we define a grammar (what are the components in a grammar)? –What is a context free grammar? –What is the language defined by a grammar?
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Syntax. Syntax defines what is grammatically valid in a programming language – Set of grammatical rules – E.g. in English, a sentence cannot begin with.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
10/13/2015IT 3271 Tow kinds of predictive parsers: Bottom-Up: The syntax tree is built up from the leaves Example: LR(1) parser Top-Down The syntax tree.
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Grammars CPSC 5135.
PART I: overview material
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Syntax and Semantics Structure of programming languages.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
Bernd Fischer RW713: Compiler and Software Language Engineering.
CPS 506 Comparative Programming Languages Syntax Specification.
COP4020 Programming Languages Parsing Prof. Xin Yuan.
Context Free Grammars CFGs –Add recursion to regular expressions Nested constructions –Notation expression  identifier | number | - expression | ( expression.
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Top-Down Parsing.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
LECTURE 7 Lex and Intro to Parsing. LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens)
Lesson 4 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 3.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Syntax and Semantics Structure of programming languages.
Chapter 3: Describing Syntax and Semantics
Chapter 3 – Describing Syntax
Programming Languages Translator
CS510 Compiler Lecture 4.
Lecture #12 Parsing Types.
Chapter 3 – Describing Syntax
Lexical and Syntax Analysis
COP4020 Programming Languages
CSC 4181Compiler Construction Context-Free Grammars
R.Rajkumar Asst.Professor CSE
CSC 4181 Compiler Construction Context-Free Grammars
BNF 9-Apr-19.
COMPILER CONSTRUCTION
Presentation transcript:

COP4020 Programming Languages Syntax Prof. Robert van Engelen (modified by Prof. Em. Chris Lacher)

COP4020 Fall 2008 Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up parsing Recursive descent parsing

COP4020 Fall 2008 Tokens Tokens are the basic building blocks of a programming language  Keywords, identifiers, literal values, operators, punctuation We saw that the first compiler phase (scanning) splits up a character stream into tokens Tokens have a special role with respect to:  Free-format languages: source program is a sequence of tokens and horizontal/vertical position of a token on a page is unimportant (e.g. Pascal)  Fixed-format languages: indentation and/or position of a token on a page is significant (early Basic, Fortran, Haskell)  Case-sensitive languages: upper- and lowercase are distinct (C, C++, Java)  Case-insensitive languages: upper- and lowercase are identical (Ada, Fortran, Pascal)

COP4020 Fall 2008 Defining Token Patterns with Regular Expressions The makeup of a token is described by a regular expression A regular expression r is one of  A character, e.g. a  Empty, denoted by   Concatenation: a sequence of regular expressions r 1 r 2 r 3 … r n  Alternation: regular expressions separated by a bar r 1 | r 2  Repetition: a regular expression followed by a star (Kleene star) r*

COP4020 Fall 2008 Example Regular Definitions for Tokens digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 unsigned_integer  digit digit* signed_integer  (+ | - |  ) unsigned_integer letter  a | b | … | z | A | B | … Z identifier  letter (letter | digit)* Cannot use recursive definitions, this is illegal: digits  digit digits | digit

COP4020 Fall 2008 Finite State Machines = Regular Expression Recognizers return(relop, LE ) return(relop, NE ) return(relop, LT ) return(relop, EQ ) return(relop, GE ) return(relop, GT ) start < = > = > = other * * 9 startletter * other letter or digit return(gettoken(), install_id()) relop  | > | >= | = id  letter ( letter | digit ) *

COP4020 Fall 2008 Context Free Grammars: BNF Regular expressions cannot describe nested constructs, but context-free grammars can Backus-Naur Form (BNF) grammar productions are of the form ::= sequence of (non)terminals where  A terminal of the grammar is a token  A defines a syntactic category  The symbol | denotes alternative forms in a production  The special symbol  denotes empty

COP4020 Fall 2008 Example ::= program ( ) ;. ::= begin end ::=, |  ::= var : ; |  ::= : ; |  ::= := | if then else | while do | begin end ::= ; |  ::= | | + | -

COP4020 Fall 2008 Extended BNF Extended BNF adds  Optional constructs with [ and ]  Repetitions with [ ]*  Some EBNF definitions also add [ ] + for non-zero repetitions

COP4020 Fall 2008 Example ::= program ( [, ]* ) ;. ::= [ ] begin [ ; ]* end ::= var [ [, ]* : ; ] + ::= := | if then else | while do | begin [ ; ]* end ::= | | + | -

COP4020 Fall 2008 Derivations From a grammar we can derive strings by generating sequences of tokens directly from the grammar (the opposite of parsing) In each derivation step a nonterminal is replaced by a right-hand side of a production for that nonterminal The representation after each step is called a sentential form When the nonterminal on the far right (left) in a sentential form is replaced in each derivation step the derivation is called right-most (left-most) The final form consists of terminals only and is called the yield of the derivation A context-free grammar is a generator of a context-free language: the language defined by the grammar is the set of all strings that can be derived

COP4020 Fall 2008 Example   identifier  + identifier  identifier + identifier  * identifier + identifier  identifier * identifier + identifier ::= identifier | unsigned_integer | - | ( ) | ::= + | - | * | /

COP4020 Fall 2008 Parse Trees A parse tree depicts the end result of a derivation  The internal nodes are the nonterminals  The children of a node are the symbols (terminals and nonterminals) on a right-hand side of a production  The leaves are the terminals identifier identifier *+

COP4020 Fall 2008 Ambiguity There is another parse tree for the same grammar and input: the grammar is ambiguous This parse tree is not desired, since it appears that + has precedence over * identifier identifier +*

COP4020 Fall 2008 Ambiguous Grammars When more than one distinct derivation of a string exists resulting in distinct parse trees, the grammar is ambiguous A programming language construct should have only one parse tree to avoid misinterpretation by a compiler For expression grammars, associativity and precedence of operators is used to disambiguate the productions ::= | ::= identifier | unsigned_integer | - | ( ) ::= + | - ::= * | /

COP4020 Fall 2008 Ambiguous if-then-else A classical example of an ambiguous grammar are the grammar productions for if-then-else: ::= if then | if then else It is possible to hack this into unambiguous productions for the same syntax, but the fact that it is not easy indicates a problem in the programming language design Ada uses different syntax to avoid ambiguity: ::= if then end if | if then else end if

COP4020 Fall 2008 Linear-Time Top-Down and Bottom-Up Parsing A parser is a recognizer for a context-free language A string (token sequence) is accepted by the parser and a parse tree can be constructed if the string is in the language For any arbitrary context-free grammar parsing can take as much as O(n 3 ) time, where n is the size of the input There are large classes of grammars for which we can construct parsers that take O(n) time:  Top-down LL parsers for LL grammars (LL = Left-to-right scanning of input, Left-most derivation)  Bottom-up LR parsers for LR grammars (LR = Left-to-right scanning of input, Right-most derivation)

COP4020 Fall 2008 Top-Down Parsers and LL Grammars Top-down parser is a parser for LL class of grammars  Also called predictive parser  LL class is a strict subset of the larger LR class of grammars  LL grammars cannot contain left-recursive productions (but LR can), for example: ::= … and ::= … ::= …  LL(k) where k is lookahead depth, if k=1 cannot handle alternatives in productions with common prefixes ::= a b … | a c … A top-down parser constructs a parse tree from the root down Not too difficult to implement a predictive parser for an unambiguous LL(1) grammar in BNF by hand using recursive descent

COP4020 Fall 2008 Top-Down Parser in Action ::= id ::=, id | ; A, B, C;

COP4020 Fall 2008 Top-Down Predictive Parsing Top-down parsing is called predictive parsing because parser “predicts” what it is going to see: 1. As root, the start symbol of the grammar is predicted 2. After reading A the parser predicts that must follow 3. After reading, and B the parser predicts that must follow 4. After reading, and C the parser predicts that must follow 5. After reading ; the parser stops

COP4020 Fall 2008 An Ambiguous Non-LL Grammar for Language E ::= + | - | * | / | ( ) | | Consider a language E of simple expressions composed of +, -, *, /, (), id, and num Need operator precedence rules

COP4020 Fall 2008 An Unambiguous Non-LL Grammar for Language E ::= + | - | ::= * | / | ::= ( ) | |

COP4020 Fall 2008 An Unambiguous LL(1) Grammar for Language E ::= ::= ::= |  ::= ( ) | | ::= |  ::= + | - ::= * | /

COP4020 Fall 2008 Constructing Recursive Descent Parsers for LL(1) Each nonterminal has a function that implements the production(s) for that nonterminal The function parses only the part of the input described by the nonterminal ::= procedure expr() term(); term_tail(); When more than one alternative production exists for a nonterminal, the lookahead token should help to decide which production to apply ::= procedure term_tail() |  case (input_token()) of '+' or '-': add_op(); term(); term_tail(); otherwise: /* no op =  */

COP4020 Fall 2008 Some Rules to Construct a Recursive Descent Parser For every nonterminal with more than one production, find all the tokens that each of the right-hand sides can start with: ::= a starts with a | b a starts with b | starts with c or d | f starts with e or f ::= c | d ::= e |  Empty productions are coded as “skip” operations (nops) If a nonterminal does not have an empty production, the function should generate an error if no token matches

COP4020 Fall 2008 Example for E procedure expr() term(); term_tail(); procedure term_tail() case (input_token()) of '+' or '-': add_op(); term(); term_tail(); otherwise: /* no op =  */ procedure term() factor(); factor_tail(); procedure factor_tail() case (input_token()) of '*' or '/': mult_op(); factor(); factor_tail(); otherwise: /* no op =  */ procedure factor() case (input_token()) of '(': match('('); expr(); match(')'); of identifier: match(identifier); of number: match(number); otherwise: error; procedure add_op() case (input_token()) of '+': match('+'); of '-': match('-'); otherwise: error; procedure mult_op() case (input_token()) of '*': match('*'); of '/': match('/'); otherwise: error;

COP4020 Fall 2008 Recursive Descent Parser’s Call Graph = Parse Tree The dynamic call graph of a recursive descent parser corresponds exactly to the parse tree Call graph of input string 1+2*3

COP4020 Fall 2008 Example ::= | ^ id | array [ ] of ::= integer | char | num dotdot num

COP4020 Fall 2008 Example (cont’d) ::= | ^ id | array [ ] of ::= integer | char | num dotdot num starts with ^ or array or anything that starts with starts with integer, char, and num

COP4020 Fall 2008 Example (cont’d) procedure match(t : token) if input_token() = t then nexttoken(); else error; procedure type() case (input_token()) of ‘integer’ or ‘char’ or ‘num’: simple(); of ‘^’: match(‘^’); match(id); of ‘array’: match(‘array’); match(‘[‘); simple(); match(‘]’); match(‘of’); type(); otherwise: error; procedure simple() case (input_token()) of ‘integer’: match(‘integer’); of ‘char’: match(‘char’); of ‘num’: match(‘num’); match(‘dotdot’); match(‘num’); otherwise: error;

COP4020 Fall 2008 Step 1 type() match(‘array’) array[num dotdot]ofintegerInput: lookahead Check lookahead and call match

COP4020 Fall 2008 Step 2 match(‘array’) array[num dotdot]ofintegerInput: lookahead match(‘[’) type()

COP4020 Fall 2008 Step 3 simple()match(‘array’) array[num dotdot]ofintegerInput: lookahead match(‘[’) match(‘num’) type()

COP4020 Fall 2008 Step 4 simple()match(‘array’) array[num dotdot]ofintegerInput: lookahead match(‘[’) match(‘num’)match(‘dotdot’) type()

COP4020 Fall 2008 Step 5 simple()match(‘array’) array[num dotdot]ofintegerInput: lookahead match(‘[’) match(‘num’) match(‘dotdot’) type()

COP4020 Fall 2008 Step 6 simple()match(‘array’) array[num dotdot]ofintegerInput: lookahead match(‘[’)match(‘]’) match(‘num’) match(‘dotdot’) type()

COP4020 Fall 2008 Step 7 simple()match(‘array’) array[num dotdot]ofintegerInput: lookahead match(‘[’)match(‘]’)match(‘of’) match(‘num’) match(‘dotdot’) type()

COP4020 Fall 2008 Step 8 simple()match(‘array’) array[num dotdot]ofintegerInput: lookahead match(‘[’)match(‘]’)type()match(‘of’) match(‘num’) match(‘dotdot’) match(‘integer’) type() simple()

COP4020 Fall 2008 Bottom-Up LR Parsing Bottom-up parser is a parser for LR class of grammars Difficult to implement by hand Tools (e.g. Yacc/Bison) exist that generate bottom-up parsers for LALR grammars automatically LR parsing is based on shifting tokens on a stack until the parser recognizes a right-hand side of a production which it then reduces to a left-hand side (nonterminal) to form a partial parse tree

COP4020 Fall 2008 Bottom-Up Parser in Action ::= id ::=, id | ; A, B, C;A A, A, B, C;A,B A, B, C;A,B, A, B, C;A,B,C A, B, C; A,B,C Cont’d … stackparse treeinput

COP4020 Fall 2008 A, B, C;A,B,C A, B, C;A,B A, B, C;A