Language Learning Week 6 Pieter Adriaans: Sophia Katrenko:

Slides:



Advertisements
Similar presentations
Automata Theory Part 1: Introduction & NFA November 2002.
Advertisements

Automata Theory December 2001 NPDAPart 3:. 2 NPDA example Example: a calculator for Reverse Polish expressions Infix expressions like: a + log((b + c)/d)
Intro to NLP - J. Eisner1 Learning in the Limit Golds Theorem.
K-means Clustering Given a data point v and a set of points X,
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Characterization of state merging strategies which ensure identification in the limit from complete data Cristina Bibire.
Lecture 24 MAS 714 Hartmut Klauck
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Finite Automata CPSC 388 Ellen Walker Hiram College.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Languages.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
© 2004 Goodrich, Tamassia Greedy Method and Compression1 The Greedy Method and Text Compression.
© 2004 Goodrich, Tamassia Greedy Method and Compression1 The Greedy Method and Text Compression.
1 Languages and Finite Automata or how to talk to machines...
Unsupervised Learning of Natural Language Morphology using MDL John Goldsmith November 9, 2001.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
1 Unsupervised Discovery of Morphemes Presented by: Miri Vilkhov & Daniel Feinstein linja-autonautonkuljettajallakaan linja-auton auto kuljettajallakaan.
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Huffman Codes Message consisting of five characters: a, b, c, d,e
NFA ε - NFA - DFA equivalence. What is an NFA An NFA is an automaton that its states might have none, one or more outgoing arrows under a specific symbol.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Introduction to Theory of Automata
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
By: Er. Sukhwinder kaur.  What is Automata Theory? What is Automata Theory?  Alphabet and Strings Alphabet and Strings  Empty String Empty String 
Algorithm Paradigms High Level Approach To solving a Class of Problems.
Facticity, Complexity and Big Data
Regular Languages: Deterministic Finite Automata (DFA) a a a a b b bb DFA = NFA (Non-deterministic) = REG {w  {a,b} * | # a W and # a W both even}
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CHAPTER 1 Regular Languages
Additive White Gaussian Noise
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
98 Nondeterministic Automata vs Deterministic Automata We learned that NFA is a convenient model for showing the relationships among regular grammars,
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
NU BURGERSCHAP 3/4 Project online 2Reclame: daar trap jij toch niet in? Voeg hier je commercial in: Invoegen > Video > Onlinevideo. Lukt dat niet? Geef.
Deterministic Finite Automata Nondeterministic Finite Automata.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Recap: Nondeterministic Finite Automaton (NFA) A deterministic finite automaton (NFA) is a 5-tuple (Q, , ,s,F) where: Q is a finite set of elements called.
NFAε - NFA - DFA equivalence
Languages.
Formal Language & Automata Theory
Chapter 5. Greedy Algorithms
Lecture 1 Theory of Automata
Non Deterministic Automata
Copyright © Cengage Learning. All rights reserved.
FORMAL LANGUAGES AND AUTOMATA THEORY
The Greedy Method and Text Compression
Theory of Computation Lecture # 9-10.
The Greedy Method and Text Compression
Hierarchy of languages
Merge Sort 11/28/2018 2:21 AM The Greedy Method The Greedy Method.
Non-Deterministic Finite Automata
Principles of Computing – UFCFA3-30-1
Non-Deterministic Finite Automata
GRAMMAR TASK INFORMATION
CSE 311: Foundations of Computing
Introduction to Finite Automata
CSE322 Minimization of finite Automaton & REGULAR LANGUAGES
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
CSCI 2670 Introduction to Theory of Computing
CSE 311 Foundations of Computing I
Teori Bahasa dan Automata Lecture 6: Regular Expression
CHAPTER 1 Regular Languages
Non Deterministic Automata
Lexical Analysis Uses formalism of Regular Languages
Presentation transcript:

Language Learning Week 6 Pieter Adriaans: Sophia Katrenko:

Contents Week 5  Information theory  Learning as Data Compression  Learning regular languages using DFA  Minimum Description Length Principle

The minimum description length (MDL) principle: J.Rissanen The best theory to explain a set of data is the one which minimizes the sum of - the length, in bits, of the description of the theory and - the length, in bits, of the data when encoded with the help of the theory Data = d Theory = t1Encoded data= t1(d) Theory = t2Encoded data= t2(d) |t2(d)| + |t2| < |t1(d)| + |t1| < |d| so t2 is the best theory

Local, Incremental (different compression paths) Data = d Theory = tEncoded data= t(d)  t1  t1(  )   t2( ,t1(  ))  t2t1  t3(t2( ,t1(  )),  )  t2t1t3

Learnability Non-compressible sets Non-constructively Compressible sets Constructively Compressible sets Learnable sets = locally, efficiently, incrementally compressible sets

Regular Languages: Deterministic Finite Automata (DFA) a a a a b b bb DFA = NFA (Non-deterministic) = REG {w  {a,b} * | # a W and # a W both even} aa abab abaaaabbb

Learning DFA’s: 2) MCA(S+) a S+ = (c, abc, ababc} abbc 9 bc a c Maximal Canonical Automaton

Learning DFA’s: 4) State Merging (Oncina,Vidal 92, Lang 98) a S+ = (c, abc, ababc} abbc 9 c c Evidence Driven State merging: 0,2

Learning DFA’s: 4) State Merging (Oncina,Vidal 92, Lang 98) a S+ = (c, abc, ababc} a bc 9 c c Evidence Driven State merging: 1,3 and 8,9 b

Learning DFA’s: 4) State Merging (Oncina,Vidal 92, Lang 98) a S+ = (c, abc, ababc} 0145 bc 9 c Evidence Driven State merging: 0,4 b

Learning DFA’s: 4) State Merging (Oncina,Vidal 92, Lang 98) a S+ = (c, abc, ababc} 015 c 9 c Evidence Driven State merging: 9,5 b

Learning DFA’s: 4) State Merging (Oncina,Vidal 92, Lang 98) a S+ = (c, abc, ababc} 01 9 c Evidence Driven State merging: 0,4 and 9,5 b

Learning DFA’s via evidence driven state merging  Input S+, S-  Output: DFA  1) Form MCA(S+) 2) Form PTA(S+) 3) Do until no merging is possible: - choose merging of two states - perform cascade of forced mergings to get a deterministic automaton - if resulting DFA accepts sentences of S- backtrack and choose another couple 4) End  Drawback: we need negative examples!!!

Learning DFA using only positive examples with MDL S+ = (c, cab, cabab, cababab, cababababab } c 012 b a c 0 L1 L2 b a Coding in bits: |L1|  5 log 2 (3+1) 2 log 2 (1+1) = 20 |L2|  5 log 2 (3+1) 2 log 2 (1+3) = 40 # arrows # letters Empty letter # states Outside world

Learning DFA using only positive examples with MDL S+ = (c, cab, cabab, cababab, cababababab } c 012 b a c 0 L1 L2 b a Coding in bits: |L1|  5 log 2 (3+1) 2 log 2 (1+1) = 20 |L2|  5 log 2 (3+1) 2 log 2 (1+3) = 40 But: L1 has 4 choices in state 0: |L1(S+)|= 26 log 2 4 = 52 L2 has 2 choices in state 1: |L2(S+)|= 16 log 2 2 = 16 |L2| + |L2(S+)|= < |L1| + |L1(S+)|= L2 is the better theory according to MDL

Learning DFA’s using only positive examples with MDL  Input S+  Output: DFA  1) Form MCA(S+) 2) Form DFA = PTA(S+) 3) Do until no merging is possible: - choose merging of two states - perform cascade of forced mergings to get a deterministic automaton DFA’ - if |DFA’|+|DFA’(S+)|  |DFA|+|DFA(S+)| backtrack and choose another couple 4) End  Drawback: Local minima!

Base case: 2-part code optimization Observed Data Learned Theory InputProgram Non-loss compression Learning |Theory| < |Data|

Paradigm case: Finite Binary string Data: Theory: Data: Theory: |Theory| = |Program| + |input| < |Data| Program For i = 1 to x print y Input x-=18; y= ’01’

Unsupervised Learning Observed Output InputProgram Non-loss compression Learning |Theory| < |Data| Non-random (computational) Proces Input Learned Theory Unknown System

Supervised Learning Observed Output InputProgram Non-loss compression Learning |Theory| < |Data| Non-random (computational) Proces Observed Input Learned Theory Unknown System

Adaptive System Observed Output InputProgram Learning Non-random (computational) Proces Observed Input Learned Theory Unknown System

Agent System Non-random (computational) Proces Unknown System Adaptive systems

Scientific Text: Bitterbase (Unilever)  The bitter taste of naringin and limonin was not affected by glutamic acid [rmflav 160] Exp.Ok;; Naringin, the second of the two bitter principles in citrus, has been shown to be a depressor of limonin bitterness detection thresholds [rmflav 1591];; Florisil reduces bitterness and tartness without altering ascorbic acid and soluble solids (primarily sugars) content [rmflav 584];; nfluence pH on system was studied. The best substrate for Rhodococcus fascians at pH 7.0 was limonoate whereas at pH 4.0 to 5.5 it appeared to be limonin. Results suggest that the citrus juice debittering process start only once the natural precursor of limonin (limonoate A ring lactone) has been transformed into limonin, the equilibrium displacement being governed by the citrus juice pH. [rmflav 474][rmflav 504];; Limonin D-ring lactone hydrolase, the enzyme catalysing the reversible lactonization/hydrolysis of D-ring in limonin, has been purified from citrus seeds and immobilized on Q-Sepharose to produce homogeneous limonoate A-ring lactone solutions. The immobilized limonin D-ring lactone hydrolase showed a good operational stability and was stable after sixty- seventy operations and storing at 4°C for six months.

Study of Benign Distributions

Colloquial Speech: Corpus Spoken Dutch  " omdat ik altijd iets met talen wilde doen." "dat stond in elk geval uh voorop bij mij." "en Nederlands leek me leuk." "da's natuurlijk een erg afgezaagd antwoord maar dat was 't wel." "en uhm ik ben d'r maar gewoon aan begonnen aan de en ik uh heb 't met uh ggg gezondheid." "ggg." "ik heb 't met uh met veel plezier gedaan." "ja prima." "ja 'k vind 't nog steeds leuk."

Study of Benign Distributions

Motherese: Sarah-Jaqueline  *JAC:kijk, hier heb je ook puzzeltjes.  *SAR:die (i)s van mij.  *JAC:die zijn van jouw, ja.  *SAR:die (i)s +...  *JAC:kijken wat dit is.  *SAR:kijken.  *JAC:we hoeven natuurlijk niet alle zooi te bewaren.  *SAR:en die.  *SAR:die (i)s van mij, die.  *JAC:die is niet kompleet.  *JAC:die legt mamma maar terug.  *SAR:die (i)s van mij.  *SAR:xxx.  *SAR:die ga in de kast, deze.  *JAC:die ["], ja.  *JAC:molenspel.  *SAR:mole(n)spel ["].

Study of Benign Distributions

Structured High Frequency Core Heavy Low Frequency Tail

Powerlaws  log c y = -a log c x + b  y = c b x- a Log x Log y

Observation  Word Frequencies in human utterances dominated by powerlaws  High Frequency core  Low Frequency heavy tail  Hypothesis: Language is open. Grammar is elastic. Occurence of new words is natural phenomenon. Syntactic/semantic bootstrapping must play an important role in language learning.  Bootstrapping might be important for ontology learning as well as child language acquisition  Better understanding of distributions is necessary

Appropriate prior distributions  Gaussian: human life span, duration of movies  Powerlaw: wordclass frequencies, gross of movies, length of poems  Erlang: reign of Pharaoh’s, number of years spent in office for members of the U.S. house of representatives  Griffitths & Tenenbaum Psychological Science 2006

‘Illusions’ caused by inappropriate use of prior distributions Casino: we see our loss as an investment (cf. survival of the fittest: the harder you try the bigger is your chance of success. Monty Hall paradox (aka Marilyn and the goats A Dutch book (against an agent) is a series of bets, each acceptable to the agent, but which collectively guarantee her loss, however the world turns out. Harvard medical school test: there are no false negatives, 1/1000 is false positive, 1/ has the disease.

25 % noise 50 % noise 75 % noise100 % noise

= = = = + 25 % NOISE + 50 % NOISE + 75 % NOISE % NOISE = 100 % NOISE Two-part code optimization Data = Theory + Theory(Data)

JPEG File Size 7 Kb JPEG File Size 8 Kb JPEG File Size 7 Kb

Fact  Standard data compression algorithms do an excellent job when one wants to study learning as data compression

Contents Week 5  Information theory  Learning as Data Compression  Learning regular languages using DFA  Minimum Description Length Principle