COP4020 Programming Languages

Slides:



Advertisements
Similar presentations
Lecture # 8 Chapter # 4: Syntax Analysis. Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive.
Advertisements

Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
ISBN Chapter 3 Describing Syntax and Semantics.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
PZ02A - Language translation
Context-Free Grammars Lecture 7
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Chapter 3: Formal Translation Models
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
EECS 6083 Intro to Parsing Context Free Grammars
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 5 Mälardalen University 2005.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
Context-Free Grammars
Grammars CPSC 5135.
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
ISBN Chapter 3 Describing Syntax and Semantics.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
COP4020 Programming Languages Syntax Prof. Robert van Engelen (modified by Prof. Em. Chris Lacher)
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Introduction to Parsing
Chapter 3 Describing Syntax and Semantics
Context Free Grammars CFGs –Add recursion to regular expressions Nested constructions –Notation expression  identifier | number | - expression | ( expression.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
1 Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
ISBN Chapter 3 Describing Syntax and Semantics.
LESSON 04.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Top-Down Parsing.
Syntax Analyzer (Parser)
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Chapter 3 – Describing Syntax
lec02-parserCFG May 8, 2018 Syntax Analyzer
CS510 Compiler Lecture 4.
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compiler Construction
Compiler Design 4. Language Grammars
COP4020 Programming Languages
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSC 4181Compiler Construction Context-Free Grammars
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSC 4181 Compiler Construction Context-Free Grammars
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
lec02-parserCFG May 27, 2019 Syntax Analyzer
Programming Languages 2nd edition Tucker and Noonan
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
COMPILER CONSTRUCTION
Presentation transcript:

COP4020 Programming Languages Syntax analysis Prof. Xin Yuan

Overview Syntax analysis overview Grammar and context-free grammar Grammar derivations Parse trees 4/17/2017 COP4020 Spring 2014 COP4020 Fall 2006

Syntax analysis Syntax analysis is done by the parser. Rest of Detects whether the program is written following the grammar rules and reports syntax errors. Produces a parse tree from which intermediate code can be generated. token Rest of front end Lexical analyzer Parse tree Int. code Source program parser Request for token Symbol table

The syntax of a programming language is described by a context-free grammar (Backus-Naur Form (BNF)). Similar to the languages specified by regular expressions, but more general. A grammar gives a precise syntactic specification of a language. From some classes of grammars, tools exist that can automatically construct an efficient parser. These tools can also detect syntactic ambiguities and other problems automatically. A compiler based on a grammatical description of a language is more easily maintained and updated.

Grammars A grammar has four components G=(N, T, P, S): T is a finite set of tokens (terminal symbols) N is a finite set of nonterminals P is a finite set of productions of the form . Where and S is a special nonterminal that is a designated start symbol 4/17/2017 COP4020 Spring 2014

Example Grammar for expression (T=?, N=?, P=?, S=?) Production: E ->E+E E-> E-E E-> (E) E-> -E E->num E->id How does this correspond to a language? Informally, you can expand the non-terminals using the productions until all are expanded: the ending sentence (a sequence of tokens) is recognized by the grammar. 4/17/2017 COP4020 Spring 2014

Language recognized by a grammar We say “aAb derives awb in one step”, denoted as “aAb=>awb”, if A->w is a production and a and b are arbitrary strings of terminal or nonterminal symbols. We say a1 derives am if a1=>a2=>…=>am, written as a1=>am The languages L(G) defined by G are the set of strings of the terminals w such that S=>w. * * 4/17/2017 COP4020 Spring 2014

Example A->aA A->bA A->a A->b G=(N, T, P, S) N=? T=? P=? S=? What is the language recognized by this grammer? 4/17/2017 COP4020 Spring 2014

Chomsky Hierarchy (classification of grammars) A grammar is said to be regular if it is right-linear, where each production in P has the form, or . Here, A and B are non-terminals and w is a terminal or left-linear context-free if each production in P is of the form , where and context sensitive if each production in P is of the form where unrestricted if each production in P is of the form where All languages recognized by regular expression can be represented by a regular grammar.

A context free grammar has four components G=(N, T, P, S): T is a finite set of tokens (terminal symbols) N is a finite set of nonterminals P is a finite set of productions of the form Where and . S is a special nonterminal that is a designated start symbol. Context free grammar is more expressive than regular expression. Consider language {ab, aabb, aaabbb, …}

BNF Notation (another form of context free grammar) Backus-Naur Form (BNF) notation for productions: <nonterminal> ::= sequence of (non)terminals where Each terminal in the grammar is a token A <nonterminal> defines a syntactic category The symbol | denotes alternative forms in a production The special symbol  denotes empty 4/17/2017 COP4020 Spring 2014

Example <Program> ::= program <id> ( <id> <More_ids> ) ; <Block> . <Block> ::= <Variables> begin <Stmt> <More_Stmts> end <More_ids> ::= , <id> <More_ids> |  <Variables> ::= var <id> <More_ids> : <Type> ; <More_Variables> |  <More_Variables> ::= <id> <More_ids> : <Type> ; <More_Variables> |  <Stmt> ::= <id> := <Exp> | if <Exp> then <Stmt> else <Stmt> | while <Exp> do <Stmt> | begin <Stmt> <More_Stmts> end <More_Stmts> ::= ; <Stmt> <More_Stmts> |  <Exp> ::= <num> | <id> | <Exp> + <Exp> | <Exp> - <Exp> 4/17/2017 COP4020 Spring 2014

Derivations From a grammar we can derive strings (= sequences of tokens) The opposite process of parsing Starting with the grammar’s designated start symbol, in each derivation step a nonterminal is replaced by a right-hand side of a production for that nonterminal A sentence (in the language) is a sequence of terminals that can be derived from the start symbol. A sentential form is a sequence of terminals and nonterminals that can be derived from the start symbol. 4/17/2017 COP4020 Spring 2014

Example Derivation <expression> ::= identifier               | unsigned_integer               | - <expression>               | ( <expression> )               | <expression> <operator> <expression> <operator> ::= + | - | * | / Start symbol <expression>    <expression> <operator> <expression>    <expression> <operator> identifier    <expression> + identifier    <expression> <operator> <expression> + identifier    <expression> <operator> identifier + identifier    <expression> * identifier + identifier    identifier * identifier + identifier Replacement of nonterminal with one of its productions Sentential forms The final string is the yield 4/17/2017 COP4020 Spring 2014

Rightmost versus Leftmost Derivations When the nonterminal on the far right (left) in a sentential form is replaced in each derivation step the derivation is called right-most (left-most) Replace in rightmost derivation <expression>    <expression> <operator> <expression>    <expression> <operator> identifier   Replace in rightmost derivation Replace in leftmost derivation <expression>    <expression> <operator> <expression>    identifier <operator> <expression>   Replace in leftmost derivation 4/17/2017 COP4020 Spring 2014

A Language Generated by a Grammar A context-free grammar is a generator of a context-free language The language defined by a grammar G is the set of all strings w that can be derived from the start symbol S L(G) = { w | S * w } <S> ::= a | ‘(’ <S> ‘)’ L(G) = { set of all strings a (a) ((a)) (((a))) … } <S> ::= <B> | <C> <B> ::= <C> + <C> <C> ::= 0 | 1 L(G) = { 0+0, 0+1, 1+0, 1+1, 0, 1 } 4/17/2017 COP4020 Spring 2014

Parse Trees A parse tree depicts the end result of a derivation The internal nodes are the nonterminals The children of a node are the symbols (terminals and nonterminals) on a right-hand side of a production The leaves are the terminals <expression> <expression> <operator> <expression> <expression> <operator> <expression> identifier * identifier + identifier 4/17/2017 COP4020 Spring 2014

Parse Trees <expression>    <expression> <operator> <expression>    <expression> <operator> identifier    <expression> + identifier    <expression> <operator> <expression> + identifier    <expression> <operator> identifier + identifier    <expression> * identifier + identifier    identifier * identifier + identifier <expression> <expression> <operator> <expression> <expression> <operator> <expression> identifier * identifier + identifier 4/17/2017 COP4020 Spring 2014

Ambiguity There is another parse tree for the same grammar and input: the grammar is ambiguous This parse tree is not desired, since it appears that + has precedence over * <expression> <expression> <operator> <expression> <expression> <operator> <expression> identifier * identifier + identifier 4/17/2017 COP4020 Spring 2014

Ambiguous Grammars Ambiguous grammar: more than one distinct derivation of a string results in different parse trees A programming language construct should have only one parse tree to avoid misinterpretation by a compiler For expression grammars, associativity and precedence of operators is used to disambiguate <expression> ::= <term> | <expression> <add_op> <term> <term> ::= <factor> | <term> <mult_op> <factor> <factor> ::= identifier | unsigned_integer | - <factor> | ( <expression> ) <add_op> ::= + | - <mult_op> ::= * | / 4/17/2017 COP4020 Spring 2014

Ambiguous if-then-else: the “Dangling Else” A classical example of an ambiguous grammar are the grammar productions for if-then-else: <stmt> ::= if <expr> then <stmt> | if <expr> then <stmt> else <stmt> It is possible to hack this into unambiguous productions for the same syntax, but the fact that it is not easy indicates a problem in the programming language design Ada uses different syntax to avoid ambiguity: <stmt> ::= if <expr> then <stmt> end if | if <expr> then <stmt> else <stmt> end if 4/17/2017 COP4020 Spring 2014