CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.

Slides:



Advertisements
Similar presentations
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Context-Free Grammars Lecture 7
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Chapter 2 A Simple Compiler
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
EECS 6083 Intro to Parsing Context Free Grammars
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Syntax and Backus Naur Form
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Context-Free Grammars
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Context-Free Grammars and Parsing
PART I: overview material
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
Chapter 3 Describing Syntax and Semantics
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Context-free grammars. Roadmap Last time – Regex == DFA – JLex for generating Lexers This time – CFGs, the underlying abstraction for Parsers.
LESSON 04.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Syntax Analyzer (Parser)
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Context-free grammars. Roadmap Last time – Regex == DFA – JLex for generating Lexers This time – CFGs, the underlying abstraction for Parsers.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
CPSC 388 – Compiler Design and Construction Parsers – Syntax Directed Translation.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Syntax(1). 2 Syntax  The syntax of a programming language is a precise description of all its grammatically correct programs.  Levels of syntax Lexical.
Introduction to Parsing
Chapter 3 – Describing Syntax
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Chapter 3 – Describing Syntax
Syntax (1).
Context-Free Grammars
Compiler Construction
Syntax versus Semantics
Compiler Construction (CS-636)
Compiler Design 4. Language Grammars
Context-Free Grammars
Context-Free Grammars
Programming Language Syntax 2
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSC 4181Compiler Construction Context-Free Grammars
CSC 4181 Compiler Construction Context-Free Grammars
Context-Free Grammars
Context-Free Grammars
COMPILER CONSTRUCTION
Faculty of Computer Science and Information System
Presentation transcript:

CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars

Announcements  HW3 due via Sakai  Solution to HW3  Homework: HW4 assigned today, due next Friday, via Sakai  Reading:  Progress Report Grades due Sept 28th

Compilers Lexical Analyzer (Scanner) Syntax Analyzer (Parser) Symantic Analyzer Intermediate Code Generator Optimizer Code Generator Source Code Token Stream Abstract Syntax Tree

Scanner Source Code: position = initial + rate * 60 ; Corresponding Tokens: IDENT(position) ASSIGN IDENT(position) PLUS IDENT(rate) TIMES INT-LIT(60) SEMI-COLON

Example Parse Source Code: position = initial + rate * 60 ; Abstract-Syntax Tree: = position+ initial* rate60 Interior nodes are operators. A node's children are operands. Each subtree forms "logical unit" e.g., the subtree with * at its root shows that because multiplication has higher precedence than addition, this operation must be performed as a unit (not initial+rate).

Limitations of RE and FSA  Regular Expressions and Finite State Automata cannot express all languages  For Example the language that consists of all balanced parenthesis: () and ((())) and (((((())))))  Parsers can recognize more types of languages than RE or FSA

Parsers  Input: Sequence of Tokens  Output: a representation of program Often AST, but could be other things  Also find syntax errors  CFGs used to define Parser (Context Free Grammar)

Context Free Grammars if ( expr ) stmt else stmt stmt → terminalsnon-terminalsRule or Production

Context Free Grammar CFGs consist of: Σ – Set of Terminals (use tokens from scanner) N – Set of Non-Terminals (variables) P – Set of Productions (also called rules) S – the Start Non-Terminal (one on left of first rule if not specified)

Productions if ( expr ) stmt else stmt stmt → Single non-terminal Sequence of zero or more terminals and non-terminals

Example with Boolean Expressions  “true” and “false” are boolean expressions  If exp 1 and exp 2 are boolean expressions, then so are: exp 1 || exp 2 exp 1 && exp 2 ! exp 1 ( exp 1 )

Corresponding CFG bexp → TRUE bexp → FALSE bexp → bexp OR bexp bexp → bexp AND bexp bexp → NOT bexp bexp → LPAREN bexp RPAREN UPPERCASE represent tokens (thus terminals) lowercase represent non-terminals

CFG for Assignments  Here is CFG for simple assignment statements (Can only assign boolean expressions to identifiers) stmt → ID ASSIGN bexp SEMICOLON

CFG for simple IF statements Combine these CFGs and add 2 more rules for simple IF statements: 1. stmt → IF LPAREN bexp RPAREN stmt 2. stmt → IF LPAREN bexp RPAREN stmt ELSE stmt 3. stmt → ID ASSIGN bexp SEMICOLON 4. bexp → TRUE 5. bexp → FALSE 6. bexp → bexp OR bexp 7. bexp → bexp AND bexp 8. bexp → NOT bexp 9. bexp → LPAREN bexp RPAREN

You Try It Write a context-free grammar for the language of very simple while loops (in which the loop body only contains one statement) by adding a new production with nonterminal stmt on the left-hand side.

CFG Languages  The language defined by a context-free grammar is the set of strings (sequences of terminals) that can be derived from the start nonterminal.  Think of productions as rewriting rules Set cur_seq = starting non-terminal While (non-terminal, X, exists in cur_seq): Select production with X on left of “→” Replace X with right portion of selected production  Try it with given CFG

What Strings are in Language 1. stmt → IF LPAREN bexp RPAREN stmt 2. stmt → IF LPAREN bexp RPAREN stmt ELSE stmt 3. stmt → ID ASSIGN bexp SEMICOLON 4. bexp → TRUE 5. bexp → FALSE 6. bexp → bexp OR bexp 7. bexp → bexp AND bexp 8. bexp → NOT bexp 9. bexp → LPAREN bexp RPAREN Set cur_seq = starting non-terminal While (non-terminal, X, exists in cur_seq): Select production with X on left of “→” Replace X with right portion of selected production

Try It Again exp → exp PLUS term exp → exp MINUS term exp → term term → term TIMES factor term → term DIVIDE factor term → factor factor → LPAREN exp RPAREN factor → ID What is the language?

Leftmost and Rightmost Derivations  A derivation is a leftmost derivation if it is always the leftmost nonterminal that is chosen to be replaced.  It is a rightmost derivation if it is always the rightmost one.

Derivation Notation  E => a  E =>* a  E =>+ a

Parse Trees Start with the start nonterminal. Repeat: choose a leaf nonterminal X choose a production X --> alpha the symbols in alpha become the children of X in the tree until there are no more leaf nonterminals left. The derived string is formed by reading the leaf nodes from left to right.

Ambiguous Grammars  Consider the grammar exp → exp PLUS exp exp → exp MINUS exp exp → exp TIMES exp exp → exp DIVIDE exp exp → INT_LIT  Construct Parse tree for 3-4/2  Are there more than one parse trees?  If there is more than one parse tree for a string then the grammar is ambiguous  Ambiguity causes problems with parsing (what is the correct structure)?

Precedence in Grammars To write a grammar whose parse trees express precedence correctly, use a different nonterminal for each precedence level. Start by writing a rule for the operator(s) with the lowest precedence ("-" in our case), then write a rule for the operator(s) with the next lowest precedence, etc:

Precedence in Grammars exp → exp MINUS exp | term term → term DIVIDE term | factor factor → INT_LIT | LPAREN exp RPAREN  Now try constructing multiple parse trees for 3-4/2  Grammar is still ambiguous. Look at associativity. Construct 2 parses tree for

Recursion on CFGs  A grammar is recursive in nonterminal X if: X derives a sequence of symbols that includes an X.  A grammar is left recursive in X if: X derives a sequence of symbols that starts with an X.  A grammar is right recursive in X if: X derives a sequence of symbols that ends with an X.  For left associativity, use left recursion.  For right associativity, use right recursion.

Ambiguity Removed in CFG exp → exp MINUS term | term term → term DIVIDE factor | factor factor → INT_LIT | LPAREN exp RPAREN  One level for each order of operation  Left recursion for left assiciativity  Now try constructing 2 parse trees for

You Try it  Construct a grammar for arithmetic expressions with addition, multiplication, exponentiation (right assoc.), subtraction, division, and unary negative. exp → exp PLUS exp | exp MINUS exp | exp TIMES exp | exp DIVIDE exp | exp POW exp | MINUS exp | LPAREN exp RPAREN | INT_LIT

Solution exp → exp PLUS term | exp MINUS term | term term → term TIMES factor | term DIVIDE factor | factor factor → exponent POW factor | exponent exponent → MINUS exponent | final final → INT_LIT | LPAREN exp RPAREN

You Try It  Write a grammar for the language of boolean expressions, with two possible operands: true false, and three possible operators: and or not. Add nonterminals so that or has lowest precedence, then and, then not. Finally, change the grammar to reflect the fact that both and and or are left associative. bexp → TRUE bexp → FALSE bexp → bexp OR bexp bexp → bexp AND bexp bexp → NOT bexp bexp → LPAREN bexp RPAREN

List Grammars  Several types of lists can be created using CFGs  One or more x's (without any separator or terminator): 1.xList → X | xList xList 2.xList → X | xList X 3.xList → X | X xList  One or more x's, separated by commas: 1.xList → X | xList COMMA xList 2.xList → X | xList COMMA X 3.xList → X | X COMMA xList

List Grammars  One or more x's, each x terminated by a semi-colon: You Try It 1.xList → X SEMICOLON | xList xList 2.xList → X SEMICOLON | xList X SEMICOLON 3.xList → X SEMICOLON | X SEMICOLON xList  Zero or more x's (without any separator or terminator): 1.xList → ε | X | xList xList 2.xList → ε | X | xList X 3.xList → ε | X | X xList

List Grammars  Zero or more x's, each terminated by a semi-colon: 1.xList → ε | X SEMICOLON | xList xList 2.xList → ε | X SEMICOLON | xList X SEMICOLON 3.xList → ε | X SEMICOLON | X SEMICOLON xList  Zero or more x's, separated by commas: You Try It 1.xList → ε | nonEmptyXList nonEmptyXList → X | X COMMA nonEmptyXList

CFGs for Whole Languages  To write a grammar for a whole programming language, break down the problem into pieces. For example, think about a Java program: a program consists of one or more classes: program → classlist classlist → class | class classlist

CFGs for Whole Languages  A class is the word "class", optionally preceded by the word "public", followed by an identifier, followed by an open curly brace, followed by the class body, followed by a closing curly brace: class → PUBLIC CLASS ID LCURLY classbody RCURLY | CLASS ID LCURLY classbody RCURLY

CFGs for Whole Languages  A class body is a list of zero or more field and/or method definitions: classbody → ε | deflist deflist → def | def deflist And So On…