Formal Languages and Grammars

Slides:



Advertisements
Similar presentations
Natural Language Processing - Formal Language - (formal) Language (formal) Grammar.
Advertisements

Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
COGN1001: Introduction to Cognitive Science Topics in Computer Science Formal Languages and Models of Computation Qiang HUO Department of Computer.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
CS5371 Theory of Computation
Languages, grammars, and regular expressions
Normal forms for Context-Free Grammars
Chapter 3: Formal Translation Models
CS 3240 – Chuck Allison.  A model of computation  A very simple, manual computer (we draw pictures!)  Our machines: automata  1) Finite automata (“finite-state.
COP4020 Programming Languages
Languages and Grammars MSU CSE 260. Outline Introduction: E xample Phrase-Structure Grammars: Terminology, Definition, Derivation, Language of a Grammar,
Language Translation Principles Part 1: Language Specification.
Chapter 2 Languages.
1 INFO 2950 Prof. Carla Gomes Module Modeling Computation: Languages and Grammars Rosen, Chapter 12.1.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
Languages & Strings String Operations Language Definitions.
Modeling Computation Rosen, ch. 12.
Introduction Syntax: form of a sentence (is it valid) Semantics: meaning of a sentence Valid: the frog writes neatly Invalid: swims quickly mathematics.
Lexical Analysis CSE 340 – Principles of Programming Languages Fall 2015 Adam Doupé Arizona State University
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Grammars CPSC 5135.
The CYK Algorithm Presented by Aalapee Patel Tyler Ondracek CS6800 Spring 2014.
CS 3813: Introduction to Formal Languages and Automata
CSNB143 – Discrete Structure Topic 11 – Language.
Introduction to Language Theory
Lecture # 5 Pumping Lemma & Grammar
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
1 Module 14 Regular languages –Inductive definitions –Regular expressions syntax semantics.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CS 3813: Introduction to Formal Languages and Automata
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Lecture 16: Modeling Computation Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output.
Discrete Structures ICS252 Chapter 5 Lecture 2. Languages and Grammars prepared By sabiha begum.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
Models of Computation by Dr. Michael P. Frank, University of Florida Modified and extended by Longin Jan Latecki, Temple University Rosen 7 th ed., Ch.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
Mid-Terms Exam Scope and Introduction. Format Grades: 100 points -> 20% in the final grade Multiple Choice Questions –8 questions, 7 points each Short.
Lecture 6: Context-Free Languages
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Lecture 17: Theory of Automata:2014 Context Free Grammars.
Chapter 2. Formal Languages Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Formal Languages and Automata FORMAL LANGUAGES FINITE STATE AUTOMATA.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
PROGRAMMING LANGUAGES
Theory of Computation Lecture #
BCT 2083 DISCRETE STRUCTURE AND APPLICATIONS
Discrete Mathematics and its Applications Rosen 7th ed., Ch. 13.1
Complexity and Computability Theory I
Natural Language Processing - Formal Language -
Context-Free Languages
A HIERARCHY OF FORMAL LANGUAGES AND AUTOMATA
Discrete Mathematics and its Applications Rosen 6th ed., Ch. 12.1
Models of Computation by Dr. Michael P
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Discrete Mathematics and its Applications Rosen 7th ed., Ch. 13.1
Discrete Mathematics and its Applications Rosen 7th ed., Ch. 13.1
Models of Computation by Dr. Michael P
Discrete Mathematics and its Applications Rosen 8th ed., Ch. 13.1
Formal Languages Context free languages provide a convenient notation for recursive description of languages. The original goal of formalizing the structure.
Models of Computation by Dr. Michael P
Languages and Grammer In TCS
Presentation transcript:

Formal Languages and Grammars Xiaoyin Wang CS 5363 Spring 2016

Last Class Compilers: Introduction Why Compilers? Input and Output Structure of Compilers Compiler Design

Today’s Class Formal Languages 3 What are languages? What are grammars? Phrase-Structure Grammars Classification of Grammars (and corresponding languages) 3

Intro to Languages English grammar tells us if a given combination of words is a valid sentence. The syntax of a sentence concerns its form while the semantics concerns its meaning, e.g. the mouse wrote a poem From a syntax point of view this is a valid sentence. 4 4

Intro to Languages From a semantics point of view not so fast…perhaps in Disney land Natural languages (English, French, etc) have very complex rules of syntax and not necessarily well-defined. Long time no see. Here you are. 5 5

Formal Language Formal language The key problem: well-defined finite set of rules The key problem: How to express infinite sentences with finite set of rules? The answer is recursion Example: Natural Numbers {1, 2, 3, …} 6

Grammars A formal grammar G is compact, precise mathematical definition of a language L. Not a list of examples Finite Fixed Typically use recursion to resolve infinity

Grammars A grammar implies an algorithm that would generate all legal sentences of the language. Often, it takes the form of a set of recursive definitions. An example of recursive definition: <expression> -> <expression> + 1 <expression> -> 1 {1, 1+1, 1+1+1, 1+1+1+1, …}

What is Grammar? Answer two key questions: Is a combination of words a valid sentence in a formal language? How can we generate the valid sentences of a formal language? Grammars provide models for both natural languages and programming languages. 9

Grammars Example: A grammar that generates a subset of the English language

A derivation of “the boy sleeps”:

A derivation of “a dog runs”:

Language of the grammar L = { “a boy runs”, “a boy sleeps”, “the boy runs”, “the boy sleeps”, “a dog runs”, “a dog sleeps”, “the dog runs”, “the dog sleeps” } Problems with the Grammar?

Notation Variable Terminal or Production Symbols of Non-terminal rule the vocabulary Terminal Symbols of the vocabulary Production rule

BCT2083 DISCRETE STRUCTURE & APPLICATIONS Basic Terminology A vocabulary / alphabet, V is a finite non-empty set of elements called symbols. Example: V = {a, b, c, A, B, C, S} A word/sentence over V is a string of finite length of elements of V. Example: Aba The empty/null string, ε is the string with no symbols. CHAPTER 5 16

Basic Terminology V* is the set of all words over V. BCT2083 DISCRETE STRUCTURE & APPLICATIONS Basic Terminology V* is the set of all words over V. Example: V* = {Aba, BBa, bAA, cab …} A language over V is a subset of V*. We can give some criteria for a word to be in a language. CHAPTER 5 17

Phrase-Structure Grammars A popular way to specify a grammar recursively A phrase-structure grammar (abbr. PSG) G = (V,T,S,P) is a 4-tuple, in which: V is a vocabulary (set of symbols) The “template vocabulary” of the language. T V is a set of symbols called terminals Actual symbols of the language. Also, N :≡ V − T is a set of special “symbols” called non-terminals. (Representing concepts like “noun”)

Phrase-Structure Grammars SN is a special non-terminal, the start symbol. in our example the start symbol was “sentence”. P is a set of productions (to be defined). Rules for substituting one sentence fragment for another Every production rule must contain at least one non-terminal on its left side.

Productions A production pP is a pair p=(b,a) of sentence fragments a, b We often denote the production as b → a. Read “replace b by a” Call b the “before” string, a the “after” string. b must contain at least 1 non-terminal

Productions Examples S -> b S -> ε A -> B S -> AB S -> aSb AS -> BB aSb -> AB AaBb -> CD …

Derivation Let G=(V,T,S,P) be a phrase-structure grammar Let w0=lz0r (the concatenation of l, z0, and r) and w1=lz1r be strings over V If z0  z1 is a production of G we say that w1 is directly derivable from w0 and we write wo => w1

Derivation If w0, w1, …., wn are strings over V such that w0 =>w1,w1=>w2,…, wn-1 => wn, then we say that wn is derivable from w0, and write w0=>*wn. The sequence of steps used to obtain wn from wo is called a derivation.

Examples Examples Productions: S -> A, A -> aB, B -> b Derivations: S => A => aB => ab S =>* ab

Languages from PSGs The recursive definition of the language L defined by the PSG: G = (V, T, S, P): Rule 1: S  LT (LT is L’s template language) The start symbol is a sentence template (member of LT).

Languages from PSGs Rule 2: (b→a)P: l,rV*: lbr  LT → lar  LT Abbreviate this using lbr  lar. (read, “lar is directly derivable from lbr”). Rule 2: (b→a)P: l,rV*: lbr  LT → lar  LT Any production, after substituting in any fragment of any sentence template, yields another sentence template. Rule 3: (σ  LT: ¬nN: nσ) → σL All sentence templates that contain no non- terminal symbols are sentences in L.

Example Let G = (V, T, S, P) Process: V = {a, b, A, S} T = {a, b} BCT2083 DISCRETE STRUCTURE & APPLICATIONS Example Let G = (V, T, S, P) V = {a, b, A, S} T = {a, b} S is a start symbol P = {S → aA, A → aa, A → b} Process: L(G)T = {S} L(G)T = {S, aA} L(G)T = {S, aA, aaa, ab} L(G) = {aaa, ab} CHAPTER 5 27

Language Let G(V,T,S,P) be a phrase-structure grammar. The language generated by G (or the language of G) denoted by L(G) , is the set of all strings of terminals that are derivable from the starting state S. L(G)= {w  T* | S =>*w} 28

What sentences can be generated with this grammar? BCT2083 DISCRETE STRUCTURE & APPLICATIONS Languages of PSG EXAMPLE: Let G = (V, T, S, P), where V = {a, b, A, B, S} T = {a, b}, S is a start symbol P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b}. G is a Phrase-Structure Grammar. What sentences can be generated with this grammar? CHAPTER 5 29

Languages of PSG Language of the grammar: How to prove? P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b}. Language of the grammar: S => Ab => ab S => Ab => aSb => aAbb => aabb … S => aB => aSb => aaBb => aabb L = {anbn, n>=1} How to prove? Mathematical Induction

Languages of PSG P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b} L1 = {anbn, n>=1} (1) Proving L1 L(P)

Languages of PSG P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b} L1 = {anbn, n>=1} (2) Proving L(P) L1 Using derivation steps as N Within x+2 steps, enumerating all productions, we can only derive: awb, w belongs to L1, so awb also belongs to L1

Another PSG Example – English Fragment We have G = (V, T, S, P), where: V = {(sentence), (noun phrase), (verb phrase), (article), (adjective), (noun), (verb), (adverb), a, the, large, hungry, rabbit, mathematician, eats, hops, quickly, wildly} T = {a, the, large, hungry, rabbit, mathematician, eats, hops, quickly, wildly} S = (sentence) P = (see next slide)

Productions for our Language P = { (sentence) → (noun phrase) (verb phrase) (noun phrase) → (article) (adjective) (noun) (noun phrase) → (article) (noun) (verb phrase) → (verb) (adverb) (verb phrase) → (verb) (article) → a (article) → the (adjective) → large (adjective) → hungry (noun) → rabbit (noun) → mathematician (verb) → eats (verb) → hops (adverb) → quickly (adverb) → wildly }

A Sample Sentence Derivation On each step, we apply a production to a fragment of the previous sentence template to get a new sentence template. Finally, we end up with a sequence of terminals (real words), that is, a sentence of our language L. (sentence) (noun phrase) (verb phrase) (article) (adj.) (noun) (verb phrase) (art.) (adj.) (noun) (verb) (adverb) the (adj.) (noun) (verb) (adverb) the large (noun) (verb) (adverb) the large rabbit (verb) (adverb) the large rabbit hops (adverb) the large rabbit hops quickly

Another Example V T Let G = ({a, b, A, B, S}, {a, b}, S, {S → ABa, A → BB, B → ab, AB → b}). One possible derivation in this grammar is: S  ABa  Aaba  BBaba  Bababa  abababa. P

Defining the PSG Types Type 0: Phase-structure grammars – no restrictions on the production rules Type 1: Context-Sensitive PSG: All after fragments are either longer than the corresponding before fragments, or empty: if b → a, then |b| < |a|  a = ε .

Defining the PSG Types Type 2: Context-Free PSG: Type 3: Regular PSGs: All before fragments have length 1 and are non-terminals: if b → a, then |b| = 1 (b  N). Type 3: Regular PSGs: All before fragments have length 1 and nonterminals All after fragments are either single terminals, or a pair of a terminal followed by a nonterminal. if b → a, then a  T  a  TN.

Types of Grammars - Chomsky hierarchy of languages Venn Diagram of Grammar Types: Type 0 – Phrase-structure Grammars Type 1 – Context-Sensitive Type 2 – Context-Free Type 3 – Regular

Examples: Regular Grammars Only the last symbol at the right hand side can be non-terminal E.g. P = {S → aB, B → bB, B → a} Generates the language ab*a The restriction can be either on the right or left E.g., P = {S → Ba, B → Bb, B → a} Also called linear grammars Why regular language is not suitable for programming language?

Examples: Context Free Grammars Only one non-terminal on the left E.g. P = {S → aSa, S → b} Generates the language anban, n>=1 Widely used as programming languages Example of non-context-free language? {anbncn, n > 1}

Examples: Context Sensitive Grammars Multiple non-terminal on the left P = {S → aSBc | abc, cB → Bc, bB → bb} Generates the language anbncn, n>=1 Exemplar Process: S => aSBc => aabcBc => aabBcc => aabbcc How to generate anbncndn, n>=1? P = {S → aBSCd | abcd, Ba → aB, dC →Cd, cC → cc, Bb → bb }

Examples: Context Sensitive Grammars For the English Grammar An apple and A book <noun_phrase> → <article> <noun> <noun> → <vowel_noun> | <other_noun> <article> <vowel_noun> → an <vowel_noun> <article> <other_noun> → a <vowel_noun>

Unrestrictive grammars No restrictions on the productions Belonging-ship can be undecidable Language acceptable by Turing machine Cannot be easily described in simple mathematical formula

Property of Grammars: Regular Language Closure Properties Intersection Union Complement Concatenation Kleene Star Decidable Properties Membership Emptiness Containment

Property of Grammars: Context Free Language Closure Properties Union Concatenation Kleene Star Decidable Properties Membership (cubic to sentence length) Emptiness (linear to # productions)

Property of Grammars: Context Sensitive Language Closure Properties Union Intersection Complement Concatenation Kleene Star Decidable Properties Membership (P-Space complete)

Today’s Class Formal Languages 48 What are languages? What are grammars? Phrase-Structure Grammars Classification of Grammars (and corresponding languages) 48

Next Class Context Free Grammar 49 Grammar Norms CYK Algorithm Derivation Tree and Ambiguity 49