600.465 - Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.

Slides:



Advertisements
Similar presentations
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Advertisements

Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Spelling Correction as an iterative process that exploits the collective knowledge of web users Silviu Cucerzan & Eric Bill Microsoft Research Proceedings.
Intro to NLP - J. Eisner1 Finite-State Methods.
CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 3: ASR: HMMs, Forward, Viterbi.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
Speech Recognition. What makes speech recognition hard?
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
FSA and HMM LING 572 Fei Xia 1/5/06.
Intro to NLP - J. Eisner1 Probabilistic CKY.
Probabilistic Pronunciation + N-gram Models CSPP Artificial Intelligence February 25, 2004.
Languages, grammars, and regular expressions
CSCI 5832 Natural Language Processing Lecture 5 Jim Martin.
. Class 5: HMMs and Profile HMMs. Review of HMM u Hidden Markov Models l Probabilistic models of sequences u Consist of two parts: l Hidden states These.
Finite state automaton (FSA)
Gobalisation Week 8 Text processes part 2 Spelling dictionaries Noisy channel model Candidate strings Prior probability and likelihood Lab session: practising.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
1 Languages, grammars, and regular expressions LING 570 Fei Xia Week 2: 10/03/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
CMPT-825 (Natural Language Processing) Presentation on Zipf’s Law & Edit distance with extensions Presented by: Kaustav Mukherjee School of Computing Science,
Introduction to English Morphology Finite State Transducers
BİL711 Natural Language Processing1 Statistical Language Processing In the solution of some problems in the natural language processing, statistical techniques.
Intro to NLP - J. Eisner1 Part-of-Speech Tagging A Canonical Finite-State Task.
November 2005CSA3180: Statistics III1 CSA3202: Natural Language Processing Statistics 3 – Spelling Models Typing Errors Error Models Spellchecking Noisy.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Computational Linguistics - Jason Eisner1 It’s All About High- Probability Paths in Graphs Airport Travel Hidden Markov Models Parsing (if you generalize)
Graphical models for part of speech tagging
Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.
NLP Language Models1 Language Models, LM Noisy Channel model Simple Markov Models Smoothing Statistical Language Models.
Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model.
Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity.
Finite State Transducers for Morphological Parsing
Intro to NLP - J. Eisner1 Bayes’ Theorem.
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
Phrase-structure grammar A phrase-structure grammar is a quadruple G = (V, T, P, S) where V is a finite set of symbols called nonterminals, T is a set.
Probabilistic CKY Roger Levy [thanks to Jason Eisner]
Intro to NLP - J. Eisner1 Building Finite-State Machines.
CS 44 – Jan. 29 Expression grammars –Associativity √ –Precedence CFG for entire language (handout) CYK algorithm –General technique for testing for acceptance.
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
Copyright © Curt Hill Finite State Automata Again This Time No Output.
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Intro to NLP - J. Eisner1 Building Finite-State Machines.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
CS 224S / LINGUIST 285 Spoken Language Processing
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Part-of-Speech Tagging
[These slides are missing most examples and discussion from class …]
Bayes’ Theorem Intro to NLP - J. Eisner.
CPSC 503 Computational Linguistics
N-Gram Model Formulas Word sequences Chain rule of probability
Building Finite-State Machines
CSA3180: Natural Language Processing
Finite-State and the Noisy Channel
CPSC 503 Computational Linguistics
Factorization by Using Identities of Sum and Difference of Two Cubes
Presentation transcript:

Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel

Intro to NLP - J. Eisner2 Word Segmentation  What does this say?  And what other words are substrings?  Could segment with parsing (how?), but slow.  Given L = a “lexicon” FSA that matches all English words.  How to apply to this problem?  What if Lexicon is weighted?  From unigrams to bigrams?  Smooth L to include unseen words? theprophetsaidtothecity

Intro to NLP - J. Eisner3 Spelling correction  Spelling correction also needs a lexicon L  But there is distortion …  Let T be a transducer that models common typos and other spelling errors  ance (  ) ence(deliverance,...)  e   (deliverance,...)    e // Cons _ Cons(athlete,...)  rr  r (embarrasş occurrence, …)  ge  dge(privilege, …)  etc.  Now what can you do with L.o. T ?  Should T and L have probabilities?  Want T to include “all possible” errors …

Intro to NLP - J. Eisner4 Noisy Channel Model noisy channel X  Y real language X yucky language Y want to recover X from Y

Intro to NLP - J. Eisner5 Noisy Channel Model noisy channel X  Y real language X yucky language Y want to recover X from Y correct spelling typos misspelling

Intro to NLP - J. Eisner6 Noisy Channel Model noisy channel X  Y real language X yucky language Y want to recover X from Y (lexicon space)* delete spaces text w/o spaces

Intro to NLP - J. Eisner7 Noisy Channel Model noisy channel X  Y real language X yucky language Y want to recover X from Y (lexicon space)* pronunciation speech language model acoustic model

Intro to NLP - J. Eisner8 Noisy Channel Model noisy channel X  Y real language X yucky language Y want to recover X from Y tree delete everything but terminals text probabilistic CFG

Intro to NLP - J. Eisner9 Noisy Channel Model noisy channel X  Y real language X yucky language Y p(X) p(Y | X) p(X,Y) * = want to recover x  X from y  Y choose x that maximizes p(x | y) or equivalently p(x,y)

Intro to NLP - J. Eisner10 Noisy Channel Model p(X) p(Y | X) p(X,Y) * = a:D/0.9 a:C/0.1 b:C/0.8 b:D/0.2 a:a/0.7 b:b/0.3.o. = a:D/0.63 a:C/0.07 b:C/0.24 b:D/0.06 Note p(x,y) sums to 1. Suppose y=“C”; what is best “x”?

Intro to NLP - J. Eisner11 Noisy Channel Model p(X) p(Y | X) p(X,Y) * = a:D/0.9 a:C/0.1 b:C/0.8 b:D/0.2 a:a/0.7 b:b/0.3.o. = a:D/0.63 a:C/0.07 b:C/0.24 b:D/0.06 Suppose y=“C”; what is best “x”?

Intro to NLP - J. Eisner12 Noisy Channel Model p(X) p(Y | X) p(X, y) * = a:D/0.9 a:C/0.1 b:C/0.8 b:D/0.2 a:a/0.7 b:b/0.3.o. = a:C/0.07 b:C/0.24.o.* C:C/1 (Y=y)? restrict just to paths compatible with output “C” best path

Intro to NLP - J. Eisner13 Morpheme Segmentation  Let Lexicon be a machine that matches all Turkish words  Same problem as word segmentation  Just at a lower level: morpheme segmentation  Turkish word: uygarlaştıramadıklarımızdanmışsınızcasına = uygar+laş+tır+ma+dık+ları+mız+dan+mış+sınız+ca+sı+na (behaving) as if you are among those whom we could not cause to become civilized  Some constraints on morpheme sequence: bigram probs  Generative model – concatenate then fix up joints  stop + -ing = stopping, fly + -s = flies, vowel harmony  Use a cascade of transducers to handle all the fixups  But this is just morphology!  Can use probabilities here too (but people often don’t)

Intro to NLP - J. Eisner14 Edit Distance Transducer a:   :a b:   :b a:b b:a a:a b:b O(k) deletion arcs O(k) insertion arcs O(k 2 ) substitution arcs O(k) no-change arcs

Intro to NLP - J. Eisner15 Edit Distance Transducer a:   :a b:   :b a:b b:a a:a b:b O(k) deletion arcs O(k) insertion arcs O(k 2 ) substitution arcs O(k) identity arcs Likely edits = high-probability arcs Stochastic

Intro to NLP - J. Eisner16 clara Edit Distance Transducer a:   :a b:   :b a:b b:a a:a b:b Stochastic.o. = caca.o. c:  l:  a:  r:  a:   :c c:c  :c l:c  :c a:c  :c r:c  :c a:c  :c c:  l:  a:  r:  a:   :a c:a  :a l:a  :a a:a  :a r:a  :a a:a  :a c:  l:  a:  r:  a:   :c c:c  :c l:c  :c a:c  :c r:c  :c a:c  :c c:  l:  a:  r:  a:   :a c:a  :a l:a  :a a:a  :a r:a  :a a:a  :a c:  l:  a:  r:  a:  Best path (by Dijkstra’s algorithm)

Speech Recognition by FST Composition (Pereira & Riley 1996) Intro to NLP - J. Eisner17 p(word seq) p(phone seq | word seq) p(acoustics | phone seq).o. trigram language model pronunciation model acoustic model.o. observed acoustics

Speech Recognition by FST Composition (Pereira & Riley 1996) Intro to NLP - J. Eisner18 p(word seq) p(phone seq | word seq) p(acoustics | phone seq).o. æ : phone context phone context CAT:k æ t trigram language model