Grammars CPSC 5135.

Slides:



Advertisements
Similar presentations
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Advertisements

Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
ICE1341 Programming Languages Spring 2005 Lecture #5 Lecture #5 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
ICE1341 Programming Languages Spring 2005 Lecture #4 Lecture #4 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
ISBN Chapter 3 Describing Syntax and Semantics.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
A basis for computer theory and A means of specifying languages
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Normal forms for Context-Free Grammars
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
Chapter 3: Formal Translation Models
Specifying Languages CS 480/680 – Comparative Languages.
COP4020 Programming Languages
ISBN Chapter 3 Describing Syntax and Semantics.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
Describing Syntax and Semantics
Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
Chpater 3. Outline The definition of Syntax The Definition of Semantic Most Common Methods of Describing Syntax.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Classification of grammars Definition: A grammar G is said to be 1)Right-linear if each production in P is of the form A  xB or A  x where A and B are.
Context-Free Grammars
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Chapter 3 Part I Describing Syntax and Semantics.
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
CPS 506 Comparative Programming Languages Syntax Specification.
C HAPTER 3 Describing Syntax and Semantics. T OPICS Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute.
Chapter 3 Describing Syntax and Semantics
Context Free Grammars CFGs –Add recursion to regular expressions Nested constructions –Notation expression  identifier | number | - expression | ( expression.
ISBN Chapter 3 Describing Syntax and Semantics.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
1 CS Programming Languages Class 04 September 5, 2000.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 3.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
Theory of Languages and Automata By: Mojtaba Khezrian.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Chapter 2. Formal Languages Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
Chapter 3: Describing Syntax and Semantics
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
Context-Free Grammars: an overview
Chapter 3 – Describing Syntax
Natural Language Processing - Formal Language -
Formal Language Theory
Syntax versus Semantics
CHAPTER 2 Context-Free Languages
CSC 4181Compiler Construction Context-Free Grammars
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSC 4181 Compiler Construction Context-Free Grammars
COMPILER CONSTRUCTION
Presentation transcript:

Grammars CPSC 5135

Formal Definitions A symbol is a character. It represents an abstract entity that has no inherent meaning Examples: a, A, 3, *, - ,=

Formal Definitions An alphabet is a finite set of symbols. Examples: A = { a, b, c } B = { 0, 1 }

Formal Definitions A string (or word) is a finite sequence of symbols from a given alphabet. Examples: S = { 0, 1 } is a alphabet 0, 1, 11010, 101, 111 are strings from S A = { a, b, c ,d } is an alphabet bad, cab, dab, d, aaaaa are strings from A

Formal Definitions A language is a set of strings from an alphabet. The set can be finite or infinite. Examples: A = { 0, 1} L1 = { 00, 01, 10, 11 } L2 = { 010, 0110, 01110,011110, …}

Formal Definitions A grammar is a quadruple G = (V, Σ, R, S) where 1) V is a finite set of variables (non-terminals), 2) Σ is a finite set of terminals, disjoint from V, 3) R is a finite set of rules. The left side of each rule is a string of one or more elements from V U Σ and whose right side is a string of 0 or more elements from V U Σ 4) S is an element of V and is called the start symbol

Formal Definitions Example grammar: G = (V, Σ, R, S) V = { S, A } Σ = { a, b } R = { S → aA A → bA A → a }

Derivations R = S → aA A → bA A → a A derivation is a sequence of replacements , beginning with the start symbol, and replacing a substring matching the left side of a rule with the string on the right side of a rule S → aA → abA → abbA → abba

Derivations What strings can be generated from the following grammar? S → aBa B → aBa B → b

Formal Definitions The language generated by a grammar is the set of all strings of terminal symbols which are derivable from S in 0 or more steps. What is the language generated by this grammar? S → a S → aB B → aB B → a

Kleene Closure Let Σ be a set of strings. Σ* is called the Kleene closure of Σ and represents the set of all concatenations of 0 or more strings in Σ. Examples Σ* = { 1 }* = { ø, 1, 11, 111, 1111, …} Σ* = { 01 }* = { ø, 01, 0101, 010101, …} Σ* = { 0 + 1 }* = set of all possible strings of 0’s and 1’s. (+ means union)

Formal Definitions A grammar G = (V,Σ, R, S) is right-linear if all rules are of the form: A → xB A → x where A, B ε V and x ε Σ*

Right-linear Grammar G = { V, Σ, R, S } V = { S, B } Σ = { a, b } R = { S → aS , S → B , B → bB , B → ε } What language is generated?

Formal Definitions A grammar G = (V,Σ, R, S) is left-linear if all rules are of the form: A → Bx A → x where A, B ε V and x ε Σ*

Formal Definitions A regular grammar is one that is either right or left linear. Let Q be a finite set and let Σ be a finite set of symbols. Also let δ be a function from Q x Σ to Q,   let q0 be a state in Q and let A be a subset of Q. We call each element of Q a state, δ the transition function, q0 the initial state and A the set of accepting states. Then a deterministic finite automaton (DFA) is a 5-tuple < Q , Σ , q0 , δ , A > Every regular grammar is equivalent to a DFA

Language Definition Recognition – a machine is constructed that reads a string and pronounces whether the string is in the language or not. (Compiler) Generation – a device is created to generate strings that belong to the language. (Grammar)

Chomsky Hierarchy Noam Chomsky (1950’s) described 4 classes of grammars 1) Type 0 – unrestricted grammars 2) Type 1 – Context sensitive grammars 3) Type 2 – Context free grammars 4) Type 3 – Regular grammars

Grammars Context-free and regular grammars have application in computing Context-free grammar – each rule or production has a left side consisting of a single non-terminal

Backus-Naur form (BNF) BNF was used to describe programming language syntax and is similar to Chomsky’s context free grammars A meta-language is a language used to describe another language BNF is a meta-language for computer languages

BNF Consists of nonterminal symbols, terminal symbols (lexemes and tokens), and rules or productions <if-stmt> → if <logical-expr> then <stmt> <if-stmt> → if <logical-expr> then <stmt> else <stmt> | if <logical-expr> then <stmt> else <stmt>

A Small Grammar <program>  begin <stmt_list> end <stmt_list>  <stmt> | <stmt> ; <stmt_list> <stmt>  <var> = <expression> <var>  A | B | C <expression>  <var> + <var> | <var> - <var> | <var>

A Derivation <program>  begin <stmt_list> end  begin <stmt> end begin <var> = <expression> end begin A = <expression> end begin A = <var> + <var> end begin A = B + <var> end begin A = B + C end

Terms Each of the strings in a derivation is called a sentential form. If the leftmost non-terminal is always the one selected for replacement, the derivation is a leftmost derivation. Derivations can be leftmost, rightmost, or neither Derivation order has no effect on the language generated by the grammar

Derivations Yield Parse Trees <program>  begin <stmt_list> end  begin <stmt> end begin <var> = <expression> end begin A = <expression> end begin A = <var> + <var> end begin A = B + <var> end begin A = B + C end <Program> begin <stmt_list> end <stmt> <var> = <expression> A <var> + <var> B C

Parse Trees Parse trees describe the hierarchical structure of the sentences of the language they define. A grammar that generates a sentence for which there are two or more distinct parse trees is ambiguous.

An Ambiguous Grammar <assign>  <id> = <expr> <id>  A | B | C <expr>  <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <id>

Two Parse Trees – Same Sentence <assign> <id> = <expr> A <expr> + <expr> <id> <expr> * <expr> B <id> <id> C A <assign> <id> = <expr> A <expr> * <expr> <expr> + <expr> <id> <id> <id> A B C

Derivation 1 <assign>  <id> = <expr>  A = <expr>  A = <expr> + <expr>  A = <id> + <expr>  A = B + <expr>  A = B + <expr> * <expr>  A = B + <id> * <expr>  A = B + C * <expr>  A = B + C * <id>  A = B + C * A

Derivation 2 <assign>  <id> = <expr>  A = <expr>  A = <expr> * <expr>  A = <expr> + <expr> * <expr>  A = <id> + <expr> * <expr>  A = B + <expr> * <expr>  A = B + <id> * <expr>  A = B + C * <expr>  A = B + C * <id>  A = B + C * A

Ambiguity Parse trees are used to determine the semantics of a sentence Ambiguous grammars lead to semantic ambiguity - this is intolerable in a computer language Often, ambiguity in a grammar can be removed

Unambiguous Grammar <assign>  <id> = <expr> <id>  A | B | C <expr>  <expr> + <term> | <term> <term>  <term> * <factor> | <factor> <factor>  ( <expr> ) | <id> This grammar makes multiplication take precedence over addition

Associativity of Operators <assign> <id> = <expr> A <expr> + <term> <expr> + <term> <factor> <term> <factor> <id> <factor> <id> A <id> C B <assign>  <id> = <expr> <id>  A | B | C <expr>  <expr> + <term> | <term> <term>  <term> * <factor> | <factor> <factor>  ( <expr> ) | <id> Addition operators associate from left to right

BNF A BNF rule that has its left hand side appearing at the beginning of its right hand side is left recursive . Left recursion specifies left associativity Right recursion is usually used for associating exponetiation operators <factor>  <exp> ** <factor> | <exp> <exp>  ( <expr> ) | <id>

Ambiguous If Grammar <stmt>  <if_stmt> <if_stmt>  if <logic_expr> then <stmt> | if <logic_expr> then <stmt> else <stmt> Consider the sentential form: if <logic_expr> then if <logic_expr> then <stmt> else <stmt>

Parse Trees for an If Statement <if_stmt> If <logic_expr> then <stmt> else <stmt> if <logic_expr> then <stmt> <if_stmt> If <logic_expr> then <stmt> if <logic_expr> then <stmt> else <stmt>

Unambiguous Grammar for If Statements <stmt>  <matched> | <unmatched> <matched>  if <logic_expr> then <matched> else <matched> | any non-if statement <unmatched>  if <logic_expr> then <stmt> | if <logic_expr> then <matched> else <unmatched>

Extended BNF (EBNF) Optional part denoted by […] <selection>  if ( <expr> ) <stmt> [ else <stmt> ] Braces used to indicate the enclosed part can be repeated indefinitely or left out <ident_list>  <identifier> { , <identifier> } Multiple choice options are put in parentheses and separated by the or operator | <for_stmt>  for <var> := <expr> (to | downto) <expr> do <stmt>

BNF vs EBNF for Expressions <expr>  <expr> + <term> | <expr> - <term> | <term> <term>  <term> * <factor> | <term> / <factor> | <factor> EBNF: <expr>  <term> { (+ | - ) <term> } <term>  <factor> { ( * | / ) <factor>