LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.

Slides:



Advertisements
Similar presentations
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
ISBN Chapter 3 Describing Syntax and Semantics.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Context-Free Grammars Lecture 7
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Chapter 3: Formal Translation Models
Specifying Languages CS 480/680 – Comparative Languages.
COP4020 Programming Languages
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
EECS 6083 Intro to Parsing Context Free Grammars
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
CS 461 – Oct. 7 Applications of CFLs: Compiling Scanning vs. parsing Expression grammars –Associativity –Precedence Programming language (handout)
Context-Free Grammars
Grammars CPSC 5135.
PART I: overview material
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
COP4020 Programming Languages Syntax Prof. Robert van Engelen (modified by Prof. Em. Chris Lacher)
Bernd Fischer RW713: Compiler and Software Language Engineering.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
Context Free Grammars CFGs –Add recursion to regular expressions Nested constructions –Notation expression  identifier | number | - expression | ( expression.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 4.
ISBN Chapter 3 Describing Syntax and Semantics.
LESSON 04.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Syntax Analyzer (Parser)
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
LECTURE 7 Lex and Intro to Parsing. LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens)
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 3.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Introduction to Parsing
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
CS 3304 Comparative Languages
Chapter 3 – Describing Syntax
CS 326 Programming Languages, Concepts and Implementation
Programming Languages Translator
CS510 Compiler Lecture 4.
Chapter 3 – Describing Syntax
CSE 3302 Programming Languages
Compiler Design 4. Language Grammars
ENERGY 211 / CME 211 Lecture 15 October 22, 2008.
COP4020 Programming Languages
Lecture 7: Introduction to Parsing (Syntax Analysis)
Programming Languages
R.Rajkumar Asst.Professor CSE
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
BNF 9-Apr-19.
High-Level Programming Language
COMPILER CONSTRUCTION
Presentation transcript:

LECTURE 4 Syntax

SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal syntactic and semantic notation to specify the rules of a language. In this lecture, we will focus on how syntax is specified.

SPECIFYING SYNTAX We know from the previous lecture that the front-end of the compiler has three main phases: Scanning Parsing Semantic Analysis Syntax Verification

SPECIFYING SYNTAX Scanning Identifies the valid tokens, the basic building blocks, within a program. Parsing Identifies the valid patterns of tokens, or constructs. So how do we specify what a valid token is? Or what constitutes a valid construct?

REGULAR EXPRESSIONS Tokens can be constructed from regular characters using just three rules: 1.Concatenation. 2.Alternation (choice among a finite set of alternatives). 3.Kleene Closure (arbitrary repetition). Any set of strings that can be defined by these three rules is a regular set. Regular sets are generated by regular expressions.

REGULAR EXPRESSIONS

Write a regular expression for each of the following: Zero or more c’s followed by a single a or a single b. Binary strings starting and ending with 1. Binary strings containing at least 3 1’s.

REGULAR EXPRESSIONS Write a regular expression for each of the following: Zero or more c’s followed by a single a or a single b. c*(a|b) Binary strings starting and ending with 1. 1|1(0|1)*1 Binary strings containing at least 3 1’s. 0*10*10*1(0|1)*

REGULAR EXPRESSIONS Let’s look at a more practical example. Say we want to write a regular expression to identify valid numbers. Some things to consider: Numbers can be any number of digits long, but must not start with 0. Numbers can be positive or negative. Numbers can be integers or real. Numbers can be represented by scientific notation (i.e. 2.9e8).

REGULAR EXPRESSIONS

So our number tokens are well-defined by the number symbol, which makes use of the other symbols to build larger expressions. Any valid pattern generated by expanding out the number symbol is a valid number. Note: while our rules build upon one another, no symbol is defined in terms of itself, even indirectly.

CONTEXT-FREE GRAMMARS We can completely define our tokens in terms of regular expressions, but more complicated constructs necessitate the ability to self-reference. This self-referencing ability takes the form of recursion. The set of strings that can be defined by adding recursion to regular expressions is known as a Context-Free Language. Context-Free Languages are generated by Context-Free Grammars.

CONTEXT-FREE GRAMMARS We’ve seen a little bit of context-free grammars, but let’s flesh out the details. Context-free grammars are composed of rules known as productions. Each production has left-hand side symbols known as non-terminals, or variables. On the right-hand side, a production may contain terminals (tokens) or other non- terminals. One of the non-terminals is named the start symbol. expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | / This notation is known as Backus-Naur Form.

DERIVATIONS So, how do we use the context-free grammar to generate syntactically valid strings of terminals (or tokens)? 1.Begin with the start symbol. 2.Choose a production with the start symbol on the left side. 3.Replace the start symbol with the right side of the chosen production. 4.Choose a non-terminal A in the resulting string. 5.Replace A with the right side of a production whose left side is A. 6.Repeat 4 and 5 until no non-terminals remain.

DERIVATIONS Let’s do a practice derivation with our grammar. We’ll derive the string “(base1 + base2) * height/2”. The start symbol is expr. expr expr op expr expr op expr op expr expr op expr op number expr op expr / number expr op id / number expr * id / number ( expr ) * id / number ( expr op expr ) * id / number ( expr op id ) * id / number ( expr + id ) * id / number ( id + id ) * id / number Each string of symbols in the steps of the derivation is called a sentential form. The final sentential form is known as the yield. expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

DERIVATIONS To save a little bit of room, we can write: expr * ( id + id ) * id / number Note that in this derivation, we replaced the right-hand side consistently, leading to a right-most derivation. There are alternative derivation methods. “derives after zero or more replacements”

PARSE TREES FROM DERIVATIONS expr expr expr op expr expr op expr op expr expr op expr op number expr op expr / number expr op id / number expr * id / number ( expr ) * id / number ( expr op expr ) * id / number ( expr op id ) * id / number ( expr + id ) * id / number ( id + id ) * id / number expr op expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

PARSE TREES FROM DERIVATIONS expr op exprop expr expr expr op expr expr op expr op expr expr op expr op number expr op expr / number expr op id / number expr * id / number ( expr ) * id / number ( expr op expr ) * id / number ( expr op id ) * id / number ( expr + id ) * id / number ( id + id ) * id / number expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

PARSE TREES FROM DERIVATIONS expr op exprop expr num expr expr op expr expr op expr op expr expr op expr op number expr op expr / number expr op id / number expr * id / number ( expr ) * id / number ( expr op expr ) * id / number ( expr op id ) * id / number ( expr + id ) * id / number ( id + id ) * id / number expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

PARSE TREES FROM DERIVATIONS expr op exprop expr num/ expr expr op expr expr op expr op expr expr op expr op number expr op expr / number expr op id / number expr * id / number ( expr ) * id / number ( expr op expr ) * id / number ( expr op id ) * id / number ( expr + id ) * id / number ( id + id ) * id / number expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

PARSE TREES FROM DERIVATIONS expr op exprop expr num/id expr expr op expr expr op expr op expr expr op expr op number expr op expr / number expr op id / number expr * id / number ( expr ) * id / number ( expr op expr ) * id / number ( expr op id ) * id / number ( expr + id ) * id / number ( id + id ) * id / number expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

PARSE TREES FROM DERIVATIONS expr op exprop expr num/id * expr expr op expr expr op expr op expr expr op expr op number expr op expr / number expr op id / number expr * id / number ( expr ) * id / number ( expr op expr ) * id / number ( expr op id ) * id / number ( expr + id ) * id / number ( id + id ) * id / number expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

PARSE TREES FROM DERIVATIONS expr op exprop expr num/id * ( ) expr expr expr op expr expr op expr op expr expr op expr op number expr op expr / number expr op id / number expr * id / number ( expr ) * id / number ( expr op expr ) * id / number ( expr op id ) * id / number ( expr + id ) * id / number ( id + id ) * id / number expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

PARSE TREES FROM DERIVATIONS expr op exprop expr num/id * ( ) expr opexpr expr expr op expr expr op expr op expr expr op expr op number expr op expr / number expr op id / number expr * id / number ( expr ) * id / number ( expr op expr ) * id / number ( expr op id ) * id / number ( expr + id ) * id / number ( id + id ) * id / number expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

PARSE TREES FROM DERIVATIONS expr op exprop expr num/id * ( ) expr opexpr id expr expr op expr expr op expr op expr expr op expr op number expr op expr / number expr op id / number expr * id / number ( expr ) * id / number ( expr op expr ) * id / number ( expr op id ) * id / number ( expr + id ) * id / number ( id + id ) * id / number expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

PARSE TREES FROM DERIVATIONS expr op exprop expr num/id * ( ) expr opexpr id + expr expr op expr expr op expr op expr expr op expr op number expr op expr / number expr op id / number expr * id / number ( expr ) * id / number ( expr op expr ) * id / number ( expr op id ) * id / number ( expr + id ) * id / number ( id + id ) * id / number expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

PARSE TREES FROM DERIVATIONS expr op exprop expr num/id * ( ) expr opexpr id + expr expr op expr expr op expr op expr expr op expr op number expr op expr / number expr op id / number expr * id / number ( expr ) * id / number ( expr op expr ) * id / number ( expr op id ) * id / number ( expr + id ) * id / number ( id + id ) * id / number expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

PARSE TREES FROM DERIVATIONS Consider the following: “length * width * height” From our grammar, we can generate two equally acceptable parse trees. expr op *expr op id * expr op *expr op id * expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

AMBIGUOUS DERIVATIONS Consider the following: “length * width * height” From our grammar, we can generate two equally acceptable parse trees. expr op *expr op id * expr op *expr op id * Grammars that allow more than one parse tree for the same string are said to be ambiguous. Parsers must, in practice, generate special rules for disambiguation. expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

AMBIGUOUS DERIVATIONS Context-free grammars can be structured such that derivations are more efficient for the compiler. Take the example of arithmetic expressions. In most languages, multiplication and division take precedence over addition and subtraction. Also, associativity tells us that operators group left to right. We could allow ambiguous derivations and let the compiler sort out the precedence later or we could just build it into the structure of the parse tree.

AMBIGUOUS DERIVATIONS Previously, we had: Building in associativity and operator precedence: expr  term | expr add_op term term  factor | term mult_op factor factor  id | number | - factor | ( expr ) add_op  + | - mult_op  * | / expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | /

AMBIGUOUS DERIVATIONS Previously, we had: Building in associativity and operator precedence: Example: * 5 expr add_op term mult_op factor number + * factor term expr  id | number | - expr | ( expr ) | expr op expr op  + | - | * | / expr  term | expr add_op term term  factor | term mult_op factor factor  id | number | - factor | ( expr ) add_op  + | - mult_op  * | /

NEXT LECTURE Scanning Finite Automata: NFAs and DFAs Implementing a Scanner