Jamie Frost – Franks Society MT10. What is language?

Slides:



Advertisements
Similar presentations
So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and.
Advertisements

Is natural language regular? Context –free? (chapter 13)
Natural Language Processing - Formal Language - (formal) Language (formal) Grammar.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
CS252: Systems Programming
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
COGN1001: Introduction to Cognitive Science Topics in Computer Science Formal Languages and Models of Computation Qiang HUO Department of Computer.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
CS21 Decidability and Tractability
Parsing I Miriam Butt May 2005 Jurafsky and Martin, Chapters 10, 13.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
Normal forms for Context-Free Grammars
1 Introduction: syntax and semantics Syntax: a formal description of the structure of programs in a given language. Semantics: a formal description of.
Chapter 3: Formal Translation Models
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 13. LANGUAGE AND COMPLEXITY 2007 년 11 월 03 일 인공지능연구실 한기덕 Text: Speech and Language Processing Page.477 ~ 498.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
Week 14 - Friday.  What did we talk about last time?  Exam 3 post mortem  Finite state automata  Equivalence with regular expressions.
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
1 Section 14.2 A Hierarchy of Languages Context-Sensitive Languages A context-sensitive grammar has productions of the form xAz  xyz, where A is a nonterminal.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Grammars CPSC 5135.
Context-Free Grammars Chapter 11. Languages and Machines.
So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and.
Introduction to Language Theory
Lecture # 5 Pumping Lemma & Grammar
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
The Chomsky Hierarchy. Sentences The sentence as a string of words E.g I saw the lady with the binoculars string = a b c d e b f.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Context Free Grammars.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.
Rules, Movement, Ambiguity
Pushdown Automata Chapters Generators vs. Recognizers For Regular Languages: –regular expressions are generators –FAs are recognizers For Context-free.
Chapter 3 Describing Syntax and Semantics
CSA2050 Introduction to Computational Linguistics Parsing I.
CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone
CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CS 203: Introduction to Formal Languages and Automata
SYNTAX.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata,
Formal Languages and Grammars
Finite State LanguagesCSE Intro to Cognitive Science1 The Computational Modeling of Language: Finite State Languages Lecture I: Slides 1-21 Lecture.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
Theory of Languages and Automata By: Mojtaba Khezrian.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Lecture 17: Theory of Automata:2014 Context Free Grammars.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Natural Language Processing Vasile Rus
PROGRAMMING LANGUAGES
Syntax Specification and Analysis
Language and Grammar classes
Complexity and Computability Theory I
Natural Language Processing - Formal Language -
Context Sensitive Languages and Linear Bounded Automata
Lecture 22 Pumping Lemma for Context Free Languages
The chomsky hierarchy Module 03.3 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
Computer Science IN ACTION The Chomsky Hierarchy & the Complexity of the English Language.
Course 2 Introduction to Formal Languages and Automata Theory (part 2)
CSE322 Chomsky classification
A HIERARCHY OF FORMAL LANGUAGES AND AUTOMATA
Formal Languages Context free languages provide a convenient notation for recursive description of languages. The original goal of formalizing the structure.
Presentation transcript:

Jamie Frost – Franks Society MT10

What is language?

What is a language? Wikipedia “A set of symbols of communication and the elements used to manipulate them.” OED “The system of spoken or written communication used by a particular country, people, community, etc., typically consisting of words used within a regular grammatical and syntactic structure.”

What is a language The possible symbols that each ‘unit’ in the language can take. For human languages, the alphabet may be at a character level. Or we could choose it to be at a word level...

What is a language Σ 2 = Σ × Σ gives us all the possible pairs of symbols. Σ * = { λ } ∪ Σ ∪ (Σ × Σ) ∪ (Σ × Σ × Σ) ∪... is known as the Kleene Star, and gives us all possible strings, i.e. containing any combination of symbols and of any length.

What is a language Any sensible language doesn’t allow a unrestricted combination of symbols. Human languages are bounded by some grammatical structure.

Mint.

Grammars So how do we define a grammar? Grammars limit our possible strings to certain forms. These vary in expressiveness – the more expressible they are, the harder it is to do certain common tasks with them. Expressiveness Task Complexity Tasks might include: “finding the grammatical structure of a string given a grammar”, or “does a string satisfy a given grammar?”. “are two grammars equivalent?”

The Chomsky Hierarchy In 1956, Noam Chomsky characterised languages according to how ‘complex’ or expressible they are. This is known as the Chomsky Hierarchy. A language that satisfies a given type is also a instance of all the grammars above it. GrammarLanguage Type-0Recursively Enumerable Type-1Context Sensitive Type-2Context Free Type-3Regular

A Formal Grammar Consists of: 1.S ⟶ i love T 2.T ⟶ T and T 3.T ⟶ smurfs 4.T ⟶ smurfettes Terminal symbols (i.e. our alphabet Σ) Non-terminal symbols (N) A start symbol (S ϵ N) Production rules

S ⟶ i love T ⟶ i love T and T ⟶ i love smurfs and T ⟶ i love smurfs and T and T ⟶ i love smurfs and smurfs and T ⟶ i love smurfs and smurfs and smurfettes A Formal Grammar 1.S ⟶ i love T 2.T ⟶ T and T 3.T ⟶ smurfs 4.T ⟶ smurfettes C) We’re not allowed to ‘finish’ until we only have terminal symbols. A) Start with start symbol B) We can use the production rules the replace things. Think of it as a game...

Regular Grammars The most restrictive. The LHS of the production rules can only be a single non-terminal. The RHS of the production rules can be one of (a) a single terminal symbol (b) a single non-terminal symbol (c) a terminal followed by a non-terminal or (d) the empty symbol. The idea is that you don’t have ‘memory’ of the symbols you’ve previously emitted in the string.

Regular Grammars Example Example Generation: S aS aaS aaaT aaabT aaabb Notice we’re always generating at the end of the string.

Regular Grammar a a b This kind of diagram is known as a ‘nondeterministic finite automaton’ or NFA.

Regular Grammar We can use this picture to work out the regular grammar: a a b

It’s Voting Time... The language of palindromes, i.e. strings which are the same when reversed, e.g. “madam”, “acrobats stab orca”. { a n b n | n ≥ 1 } i.e. ab, aabb, aaabbb, aaaabbbb,... Neither are. The problem is that we cannot ‘remember’ the symbols already emitted. We can use something called the pumping lemma to check if a language is regular.

Context Free Grammars The restriction on the RHS of the production rules is now loosened; we can have any combination of non- terminals and terminals. We still restrict the LHS however to a single non- terminal. This is why the grammar is known as “context free”, since the production is not dependent on the context in which it occurs. While generating a string:... abXd abyd The production rule which allows the X to become a y is not contingent on the context, i.e. The preceding b or the proceeding d.

Context Free Grammars Examples Example Generation: S aSa acSca acbca

Examples Context Free Grammars Example generation: S aSb aaSbb aaaSbbb aaabbb

It’s Voting Time... { a n b n c n | n ≥ 1 } i.e. abc, aabbcc, aaabbbccc,... Nope. A bit harder to see this time. Can use a variant of the Pumping Lemma called the Bar-Hillel Lemma to show it isn’t. (But informal explanation as such: We could have non-terminals at the a-b and b-c boundary generating these pairs, but since our language is context free these non-terminals expand independently of each other, thus we can only ensure a and b have the same count, or b and c. And we can’t have a rule of the form S-> X abc Y because then we can’t subsequently increase the number of b’s.)

Context-Sensitive Grammars Now an expansion of a non-terminal is dependent on the context it appears in. Example generation: S aSBC aaBCBC aaBHBC aaBBCC aabBCC aabbCC aabbcC aabbcc i.e. a ‘C’ can change into a ‘c’ only when preceded by another ‘c’. Note that this context (i.e. this preceding ‘c’) must remain unchanged. Preservation of context is the only restriction in CSGs.

The Chomsky Hierarchy Gram mar LanguageAutomatonRules Type-0Recursively enumerable Turing Machine α ⟶ βα ⟶ β Type-1Context sensitive Linear bounded Non- deterministic Turing Machine α A β ⟶ αγβ Type-2Context Free Non-deterministic Pushdown Automaton A ⟶ γ Type-3RegularFinite State Automaton A ⟶ a and A ⟶ aB The picture with circles and arrows we saw earlier.

English as a CFG Before we get on to classifying English according to the Chomsky Hierarchy, let’s see how English might be represented as a CFG. Our starting non-terminal S is a sentence. Since sentences operate independently syntactically, it’s sufficient to examine grammar on a sentence level. Our terminals/alphabet Σ is just a dictionary in the literal sense. Σ = { a, aardvark,...., zebra, zoology, zyzzyva }

English as a CFG Our non-terminals are ‘constituents’, such as noun phrases, verb phrases, verbs, determiners, prepositional phrases, etc. These can be subdivided into further constituents (e.g. NP = noun phrase), or generate a terminals (e.g. V = verb) Non-TerminalNameExample NPNoun Phrasethe cat VPVerb Phrasechastised the politician PPPrepositional Phrasewith the broccoli CONJConjunctionand VVerbchundered ADVAdverbeverywhere

English as a CFG Can use an American style ‘top-down’ generative form of grammar. S NP VP NP DT N NP PN VP VP PP NP ⟶ NP PP PP P NP NP NP CONJ NP S NP VP NP DT N NP PN VP VP PP NP ⟶ NP PP PP P NP NP NP CONJ NP DT the DT a N monkey N student N ⟶ telescope PN Corey P with P over CONJ and CONJ or V ⟶ saw V ⟶ ate DT the DT a N monkey N student N ⟶ telescope PN Corey P with P over CONJ and CONJ or V ⟶ saw V ⟶ ate

Ambiguity Curiously, it’s possible to generate a sentence in multiple ways! S ⟶ NP VP ⟶ PN VP ⟶ Corey VP ⟶ Corey VP PP ⟶ Corey V NP PP ⟶ Corey saw NP PP ⟶... ⟶ Corey saw the monkey with the telescope. S ⟶ NP VP ⟶ PN VP ⟶ Corey VP ⟶ Corey V NP ⟶ Corey saw NP ⟶ Corey saw NP PP ⟶... ⟶ Corey saw the monkey with the telescope.

Ambiguity We say that a formal grammar that can yield the same string from multiple derivations is ‘ambiguous’.

So what kind of language...

Embedded Structures (Yngve 60) The cat likes tuna fish. The cat the dog chased likes tuna fish. The cat the dog the rat bit chased likes tuna fish. The cat the dog the rat the elephant admired bit chased likes tuna fish.

Embedded Structures The cat the dog the rat the elephant admired bit chased likes tuna fish. If we let A = { the dog, the rat, the elephant } and B = { admired, bit, chased } then we represent centre-embedding as such: But we already know from earlier that a n b n is not regular!

So what kind of language...

Swiss German A number of languages, such as Dutch and Swiss German, allow for cross-serial dependencies...mer d’chind em Hans es huus haend wele laa halfe aastriiche...we the children/ACC Hans/DAT the house/ACC have wanted to let help paint. DAT = Dative noun: the indirect object of a verb (e.g. John gave Mary the book”). ACC = Accusative noun: the direct object of a verb (e.g. John gave Mary the book”).

Swiss German Shieber (1985) notes that among such sentences, those with all accusative NPs preceding all dative NPs, and all accusative-subcategorising verbs preceding all dative-subcategorising verbs are acceptable. The number of verbs requiring dative objects (halfe) must equal the number of dative NPs (em Hans) and similarly for accusatives....mer d’chind em Hans es huus haend wele laa halfe aastriiche...we the children/ACC Hans/DAT the house/ACC have wanted to let help paint.

Swiss German...mer d’chind em Hans es huus haend wele laa halfe aastriiche...we the children/ACC Hans/DAT the house/ACC have wanted to let help paint.

Summary The Chomsky Hierarchy brings together different grammar formalisms, listing them in increasing order of expressiveness. Languages don’t have to be human ones: they allow us to generate strings with some given alphabet Σ, subject to some grammatical constraints. The least expressive languages are Regular Grammars, which are insufficient to represent the English language. But Context Free Grammars are insufficient to represent languages with cross-serial dependencies, such as Swiss German.

Questions?