Compiler Design 4. Language Grammars

Slides:



Advertisements
Similar presentations
lec02-parserCFG March 27, 2017 Syntax Analyzer
Advertisements

Chapter 3: Formal Translation Models
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
COP4020 Programming Languages
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
CPS 506 Comparative Programming Languages Syntax Specification.
A Programming Languages Syntax Analysis (1)
. n COMPILERS n n AND n n INTERPRETERS. -Compilers nA compiler is a program thatt reads a program written in one language - the source language- and translates.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
Syntax(1). 2 Syntax  The syntax of a programming language is a precise description of all its grammatically correct programs.  Levels of syntax Lexical.
Introduction to Parsing
CS 3304 Comparative Languages
Chapter 3: Describing Syntax and Semantics
Chapter 3 – Describing Syntax
Compiler Design (40-414) Main Text Book:
lec02-parserCFG May 8, 2018 Syntax Analyzer
Describing Syntax and Semantics
Constructing Precedence Table
Parsing & Context-Free Grammars
CS510 Compiler Lecture 4.
Context-free grammars, derivation trees, and ambiguity
Parsing and Parser Parsing methods: top-down & bottom-up
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Chapter 3 – Describing Syntax
Syntax (1).
What does it mean? Notes from Robert Sebesta Programming Languages
Automata and Languages What do these have in common?
Compiler Construction
Parsing with Context Free Grammars
Syntax versus Semantics
Compiler Lecture 1 CS510.
Compiler Construction (CS-636)
CS416 Compiler Design lec00-outline September 19, 2018
Parsing Techniques.
CSE 3302 Programming Languages
Lecture 3: Introduction to Syntax (Cont’)
Lexical and Syntax Analysis
(Slides copied liberally from Ruth Anderson, Hal Perkins and others)
Context-Free Grammars
Introduction CI612 Compiler Design CI612 Compiler Design.
Programming Language Syntax 2
Compiler Design 7. Top-Down Table-Driven Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSC 4181Compiler Construction Context-Free Grammars
R.Rajkumar Asst.Professor CSE
CS 3304 Comparative Languages
Introduction to Parsing
Introduction to Parsing
Grammar design: Associativity
CS416 Compiler Design lec00-outline February 23, 2019
CSC 4181 Compiler Construction Context-Free Grammars
Context-Free Grammars
BNF 9-Apr-19.
Chapter 10: Compilers and Language Translation
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Discrete Maths 13. Grammars Objectives
lec02-parserCFG May 27, 2019 Syntax Analyzer
Context-Free Grammars
Programming Languages 2nd edition Tucker and Noonan
COMPILER CONSTRUCTION
Faculty of Computer Science and Information System
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Compiler Design 4. Language Grammars Kanat Bolazar January 28, 2010

Introduction to Parsing: Language Grammars Programming language grammars are usually written as some variation of Context Free Grammars (CFG)s Notation used is often BNF (Backus-Naur form): <block> -> { <statementlist> } <statementlist> -> <statement> ; <statementlist> <statement> -> <assignment> ; | if ( <expr> ) <block> else <block> | while ( <expr> ) <block> ...

Example Grammar: Language 0+0 A language that we'll call "Language 0+0": E -> E + E | 0 Equivalently: E -> E + E E -> 0 Note that if there are multiple rules for the same left hand side, they are alternatives. This language only contains sentences of the form: 0 0+0 0+0+0 0+0+0+0 ... Derivation for 0+0+0: E -> E + E -> E + E + E -> 0 + 0 + 0 Note: This language is ambiguous: In the second step, did we expand the first or the second E to E + E? Both paths work.

Example Grammar: Arithmetic, Ambiguous Arithmetic expressions: Exp -> num | Exp Operator Exp Op -> + | - | * | / | % The "num" here represents a token. What it corresponds to is defined in the lexical analyzer with a regular expression: num [0-9]+ This langugage allows: 45 35 + 257 * 5 - 2 ... This language as defined here is ambiguous: 2 + 5 * 7 Exp * 7 or 2 + Exp ? Depending on the tools you use, you may be able to just define precedence of operators, or may have to change the grammar.

Example Language: Arithmetic, Factored Arithmetic expressions grammar, factored for operator precedence: Exp -> Factor | Factor Addop Exp Factor -> num | num Multop Factor Addop -> + | - Multop -> * | / | % This langugage also allows the same sentences: 45 35 + 257 * 5 - 2 ... This language is not ambiguous; it first groups factors: 2 + 5 * 7 Factor Addop Exp num + Exp num + Factor num + num Multop Factor num + num * num

Grammar Definitions The grammar is a set of rules, sometimes called productions, that construct valid sentences in the language. Nonterminal symbols represent constructs in the language. These would be the phrases in a natural language. Terminal symbols are the actual words of the language. These are the tokens produced by the lexical analyzer. In a natural language, these would be the words, symbols, and space. A sentence in the language only contains terminal symbols. Nonterminals are intermediate linguistic constructs to define the structure of a sentence.

Rules, Nonterminal and Terminal Symbols Arithmetic expressions grammar, using multiplicative factors for operator precedence: Exp -> Factor | Factor Addop Exp Factor -> num | num Multop Factor Addop -> + | - Multop -> * | / | % This langugage has four rules as written here. If we expand each option, we would have 2 + 2 + 2 + 3 = 9 rules. There are four nonterminals: Exp Factor Addop Multop There are six terminals (tokens): num + - * / %

Grammar Definitions: Rules The production rules are rewrite rules. The basic CFG rule form is: X -> Y1 Y2 Y3 … Yn where X is a nonterminal and the Y’s may be nonterminals or terminals. There is a special nonterminal called the Start symbol. The language is defined to be all the strings that can be generated by starting with the start symbol, repeatedly replacing nonterminals by the rhs of one of its rules until there are no more nonterminals.

Larger Grammar Examples We'll look at language grammar examples for MicroJava and Decaf. Note: Decaf extends the standard notation; the very useful { X }, to mean X | X, X | X, X, X | ... is not standard.

Parse Trees Derivation of a sentence by the language rules can be used to construct a parse tree. We expect parse trees to correspond to meaningful semantic phrases of the programming language. Each node of the parse tree will represent some portion that can be implemented as one section of code. The nonterminals expanded during the derivation are trunk/branches in the parse tree. The terminals at the end of branches are the leaves of the parse tree.

Parsing A parser: Top-down parsing: Bottom-up parsing: Uses the grammar to check whether a sentence (a program for us) is in the language or not. Gives syntax error If this is not a proper sentence/program. Constructs a parse tree from the derivation of the correct program from the grammar rules. Top-down parsing: Starts with the start symbol and applies rules until it gets the desired input program. Bottom-up parsing: Starts with the input program and applies rules in reverse until it can get back to the start symbol. Looks at left part of input program to see if it matches the rhs of a rule.

Parsing Issues Derivation Paths = Choices Ambiguity Naïve top-down and bottom-up parsing may require backtracking to find a correct parse. Restrictions on the form of grammar rules to make parsing deterministic. Ambiguity One program may have two different correct derivations from the grammar. This may be a problem if it implies two different semantic interpretations. Famous examples are arithmetic operators and the dangling else problem.

Ambiguity: Dangling Else Problem Which if does this else associate with? if X if Y find() else getConfused() The corresponding ambiguous grammar may be: IfSttmt -> if Cond Action | if Cond Action else Action Two derivations at top (associated with top "if") are: if Cond Action if Cond Action else Action Programming languages often associate else with the inner if.

Resources Aho, Lam, Sethi, and Ullman, Compilers: Principles, Techniques, and Tools, 2nd ed. Addison-Wesley, 2006. Compiler Construction Course Notes at Linz: http://www.ssw.uni-linz.ac.at/Misc/CC/ CS 143 Compiler Course at Stanford: http://www.stanford.edu/class/cs143/