Introduction to Language Theory

Slides:



Advertisements
Similar presentations
Natural Language Processing - Formal Language - (formal) Language (formal) Grammar.
Advertisements

Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
Regular Grammars Formal definition of a regular expression.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
Languages, grammars, and regular expressions
Fall 2005 CSE 467/567 1 Formal languages regular expressions regular languages finite state machines.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Normal forms for Context-Free Grammars
CS 3240 – Chuck Allison.  A model of computation  A very simple, manual computer (we draw pictures!)  Our machines: automata  1) Finite automata (“finite-state.
Language Translation Principles Part 1: Language Specification.
1 INFO 2950 Prof. Carla Gomes Module Modeling Computation: Languages and Grammars Rosen, Chapter 12.1.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
Languages & Strings String Operations Language Definitions.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
Theory of computing, part 1. Von Neumann Turing machine Finite state machines NP complete problems -maximum clique -travelling salesman problem -colour.
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
Lecture Two: Formal Languages Formal Languages, Lecture 2, slide 1 Amjad Ali.
1 Section 14.2 A Hierarchy of Languages Context-Sensitive Languages A context-sensitive grammar has productions of the form xAz  xyz, where A is a nonterminal.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Grammars CPSC 5135.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and.
1 Introduction to Regular Expressions EELS Meeting, Dec Tom Horton Dept. of Computer Science Univ. of Virginia
CS 3813: Introduction to Formal Languages and Automata
1 Computability Five lectures. Slides available from my web page There is some formality, but it is gentle,
Lecture # 5 Pumping Lemma & Grammar
Grammar G = (V N, V T, P, S) –V N : Nonterminal symbols –V T : Terminal symbols V N  V T = , V N ∪ V T = V – P : a finite set of production rules α 
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
CS 3813: Introduction to Formal Languages and Automata
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
Three Basic Concepts Languages Grammars Automata.
Formal Languages and Grammars
Discrete Structures ICS252 Chapter 5 Lecture 2. Languages and Grammars prepared By sabiha begum.
Lecture # Book Introduction to Theory of Computation by Anil Maheshwari Michiel Smid, 2014 “Introduction to computer theory” by Daniel I.A. Cohen.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
1 Course Overview Why this course “formal languages and automata theory?” What do computers really do? What are the practical benefits/application of formal.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
Lecture 6: Context-Free Languages
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
Theory of Languages and Automata By: Mojtaba Khezrian.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Chapter 2. Formal Languages Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Lecture #2 Advanced Theory of Computation. Languages & Grammar Before discussing languages & grammar let us deal with some related issues. Alphabet: is.
Chapter 1 INTRODUCTION TO THE THEORY OF COMPUTATION.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
COP4620 – Programming Language Translators Dr. Manuel E. Bermudez
Language Theory Module 03.1 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
Discrete Mathematics and its Applications Rosen 7th ed., Ch. 13.1
Grammars Module 03.2 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
Language and Grammar classes
Complexity and Computability Theory I
Natural Language Processing - Formal Language -
The chomsky hierarchy Module 03.3 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
Formal Language Theory
PARSE TREES.
COP4620 – Programming Language Translators Dr. Manuel E. Bermudez
REGULAR LANGUAGES AND REGULAR GRAMMARS
A HIERARCHY OF FORMAL LANGUAGES AND AUTOMATA
Discrete Mathematics and its Applications Rosen 6th ed., Ch. 12.1
Models of Computation by Dr. Michael P
Discrete Mathematics and its Applications Rosen 7th ed., Ch. 13.1
Discrete Mathematics and its Applications Rosen 7th ed., Ch. 13.1
Models of Computation by Dr. Michael P
Discrete Mathematics and its Applications Rosen 8th ed., Ch. 13.1
Models of Computation by Dr. Michael P
Languages and Grammer In TCS
Presentation transcript:

Introduction to Language Theory Programming Language Translators Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida

Introduction to Language Theory Definition: An alphabet (or vocabulary) Σ is a finite set of symbols. Example: Alphabet of Pascal: + - * / < … (operators) begin end if var (keywords) <identifier> (identifiers) <string> (strings) <integer> (integers) ; : , ( ) [ ] (punctuators) Note: All identifiers are represented by one symbol, because Σ must be finite.

Introduction to Language Theory Definition: A sequence t = t1t2…tn of symbols from an alphabet Σ is a string. Definition: The length of a string t = t1t2…tn (denoted |t|) is n. If n = 0, the string is ε, the empty string. Definition: Given strings s = s1s2…sn and t = t1t2…tm, the concatenation of s and t, denoted st, is the string s1s2…snt1t2…tm.

Introduction to Language Theory Note: εu = u = uε, uεv = uv, for any strings u,v (including ε) Definition: Σ* is the set of all strings of symbols from Σ. Note: Σ* is called the reflexive, transitive closure of Σ. Σ* is described by the graph (Σ*, ·), where “·” denotes concatenation, and there is a designated “start” node, ε.

Introduction to Language Theory Example: Σ = {a, b}. (Σ*, ·) Σ* is countably infinite, so can’t compute all of Σ*, and can only compute finite subsets of Σ*, but can compute whether a given string is in Σ*. aa a a aba a b a ab b abb ε b a ba b b bb

Introduction to Language Theory Example: Σ = Pascal vocabulary. Σ* = all possible alleged Pascal programs, i.e. all possible inputs to Pascal compiler. Need to specify L  Σ*, the correct Pascal programs. Definition: A language L over an alphabet Σ is a subset of Σ*.

Introduction to Language Theory Example: Σ = {a, b}. L1 = ø is a language L2 = {ε} is a language L3 = {a} is a language L4 = {a, ba, bbab} is a language L5 = {anbn / n >= 0} is a language where an = aa…a, n times L6 = {a, aa, aaa, …} is a language Note: L5 is an infinite language, but described finitely.

Introduction to Language Theory THIS IS THE MAIN GOAL OF LANGUAGE SPECIFICATION : To describe (infinite) programming languages finitely, and to provide corresponding finite inclusion-test algorithms.

Language Constructors Definition: The catenation (or product) of two languages L1 and L2, denoted L1L2, is the set {uv | uL1, vL2}. Example: L1 = {ε, a, bb}, L2 = {ac, c} L1L2 = {ac, c, aac, ac, bbac, bbc} = {ac, c, aac, bbac, bbc}

Language Constructors Definition: Ln = LL…L (n times), and L0 = {ε}. Example: L = {a, bb} L3 = {aaa, aabb, abba, abbbb, bbaa, bbabb, bbbba, bbbbbb}

Language Constructors Definition: The union of two languages L1 and L2 is the set L1 L2 = {u | uL1} { v | vL2} Definition: The Kleene star (L*) of a language is the set L* = U Ln, n >0. Example: L = {a, bb} L* = {any string composed of a’s and bb’s} Definition: The Transitive Closure (L+) of a language L is the set L+ = U Ln, n > 1. ∩ ∩

Language Constructors Note: In general, L* = L+ U {ε}, but L+ ≠ L* - {ε}. For example, consider L = {ε}. Then {ε} = L+ ≠ L* – {ε} = {ε} – {ε} = ø.

Grammars Goal: Providing a means for describing languages finitely. Method: Provide a subgraph (Σ*, →*) of (Σ*, ·), and a start node S, such that the set of reachable nodes (from S) are the strings in the language.

Grammars Example: Σ = {a, b} L = {anbn / n > 0} ε a aaa a aaba a aa aabb ε a ba bbaa a b a b bba b bb b bbab b bbb

Grammars “=>” (derives) is a relation defined by a finite set of rewrite rules known as productions. Definition: Given a vocabulary V, a production is a pair (u, v)  V* x V*, denoted u → v. u is called the left-part; v is called the right-part.

Grammars Example: Pseudo-English. V = {Sentence, NP, VP, Adj, N, V, boy, girl, the, tall, jealous, hit, bit} Sentence → NP VP (one production) NP → N NP → Adj NP N → boy N → girl Adj → the Adj → tall Adj → jealous VP → V NP V → hit V → bit Note: English is much too complicated to be described this way.

Grammars Definition: Given a finite set of productions P  V* x V* the relation => is defined such that , β, u, v  V* , uβ => vβ iff u → v  P is a production. Example: Sentence → NP VP Adj → the NP → N Adj → tall NP → Adj NP Adj → jealous N → boy VP → V NP N → girl V → hit V → bit

Grammars => Adj NP VP => the NP VP => the Adj NP VP Sentence => NP VP => Adj NP VP => the NP VP => the Adj NP VP => the jealous NP VP => the jealous N VP => the jealous girl VP => the jealous girl V NP => the jealous girl hit NP => the jealous girl hit Adj NP => the jealous girl hit the NP => the jealous girl hit the N => the jealous girl hit the boy

Grammars Definition: A grammar is a 4-tuple G = (Φ, Σ, P, S) where Φ is a finite set of nonterminals, Σ is a finite set of terminals, V = Φ U Σ is the grammar’s vocabulary, S  Φ is called the start or goal symbol, and P  V* x V* is a finite set of productions. Example: Grammar for {anbn / n > 0}. G = (Φ, Σ, P, S), where Φ = {S}, Σ = {a, b}, and P = {S → aSb, S → ε}

Grammars Derivations: S => aSb => aaSbb => aaaSbbb => aaaaSbbbb → … ε ab aabb aaabbb aaaabbbb Note: Normally, grammars are given by simply listing the productions. => => => => =>

Grammar Conventions TWS convention Upper case letter (identifier) – nonterminal Lower case letter (string) – terminal Lower case greek letter – strings in V* Left part of the first production is assumed to be the start symbol, e.g. S → aSb S → ε Left part omitted if same as for preceeding production, e.g. → ε

Grammars Example: Grammar for identifiers. Identifier → Letter → Identifier Digit Letter → ‘a’ → ‘A’ → ‘b’ → ‘B’ . → ‘z’ → ‘Z’ Digit → ‘0’ → ‘1’ → ‘9’

Grammars Definition: The language generated by a grammar G, is the set L(G) = {  Σ* | S =>*  } Definition: A sentential form generated by a grammar G is any string α such that S =>*  . Definition: A sentence generated by a grammar G is any sentential form  such that   Σ*.

Grammars Example: sentential forms S => aSb => aaSbb => aaaSbbb => aaaaSbbbb > … ε ab aabb aaabbb aaaabbbb Lemma: L(G) = { | is a sentence} Proof: Trivial. => => => => => sentences

Grammars Example: A → aABC → aBC aB → ab bB → bb bC → bc CB → BC cC → cc

Grammars Derivations: A => aABC => aaABCBC => … => aBC aaBCBC aaaBCBCBC abC aabCBC aaaBBCBCC abc aabBCC aaaBBBCCC aabbCC aaabBBCCC (2) aabbcC aaabbbCCC aabbcc aaabbbcCC aaabbbccc L (G) = {anbncn | n > 1} => => => => => => => => => => => => => => => =>

The Chomsky Hierarchy A hierarchy of grammars, the languages they generate, and the machines the accept those languages.

The Chomsky Hierarchy Type Language Name Grammar Restrictions On grammar Accepting Machine Recursively Enumerable Unrestricted re-writing system None Turing Machine 1 Context-Sensitive Language Context- Sensitive Grammar For all →, ||≤|| Linear Bounded Automaton 2 Context- Free Language Context- Free Grammar Φ. Push-Down Automaton (parser) 3 Regular Φ, U ΦU{} Finite- State Automaton

24/04/2017 Language Hierarchy 0: Recursively Enumerable Languages 1: Context-Sensitive Languages 2: Context-free Languages We will deal with type 2 (syntax) and type 3 (lexicon) languages. 3: Regular Languages {an | n > 0} Be selective. You do not need to cover both research and education. It does not need to be a long list. You can put down just one opportunity that you are really excited about. Just identify what you think are the biggest opportunities for your department faculty. Strike a balance between “thinking big” and being realistic. One way to think would be to say that if you were the Dean, you would invest in these opportunities. Remember the goal is to have national level prominence and visibility where our peer group will recognize our activities and accomplishments. For example, the NSF ERC on Particle Science and Technology As you go to the next slide, please bear in mind that there may well be very strong connections between this slide and the next on multi-disciplinary collaborations. {anbn | n>0} {anbncn | n>0} English?