LIN3022 Natural Language Processing Lecture 4 Albert Gatt LIN3022 -- Natural Language Processing.

Slides:



Advertisements
Similar presentations
Kompresja danych... accept the convention that the word "compression" be encoded as "comp".... ZIP file format... both the sender and receiver of the information.
Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Indexing DNA Sequences Using q-Grams
LING 438/538 Computational Linguistics Sandiway Fong Lecture 17: 10/25.
DYNAMIC PROGRAMMING ALGORITHMS VINAY ABHISHEK MANCHIRAJU.
Inexact Matching of Strings General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic.
Albert Gatt Corpora and Statistical Methods Lecture 8.
Probabilistic Pronunciation + N-gram Models CSPP Artificial Intelligence February 25, 2004.
Texture This isn’t described in Trucco and Verri Parts are described in: – Computer Vision, a Modern Approach by Forsyth and Ponce –“Texture Synthesis.
Dynamic Programming Solving Optimization Problems.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Distance Functions for Sequence Data and Time Series
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Announcements Problem Set 2 distributed. –Due March 10 weeks. –Has been shortened, due to snow. –Tour of skeleton code. #include Problem Set 1, accepted.
February 3, 2010Harvard QR481 Coding and Entropy.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 11: Core String Edits.
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
LIN3022 Natural Language Processing Lecture 5 Albert Gatt LIN Natural Language Processing.
Language Models Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Note: some of the material in this slide set was adapted from an NLP course.
Decision Trees and Information: A Question of Bits Great Theoretical Ideas In Computer Science Steven Rudich, Anupam GuptaCS Spring 2004 Lecture.
Formal Models of Language. Slide 1 Language Models A language model an abstract representation of a (natural) language phenomenon. an approximation to.
BİL711 Natural Language Processing1 Statistical Language Processing In the solution of some problems in the natural language processing, statistical techniques.
LING 438/538 Computational Linguistics
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.
LING/C SC/PSYC 438/538 Lecture 19 Sandiway Fong. Administrivia Next Monday – guest lecture from Dr. Jerry Ball of the Air Force Research Labs to be continued.
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Resources: Problems in Evaluating Grammatical Error Detection Systems, Chodorow et al. Helping Our Own: The HOO 2011 Pilot Shared Task, Dale and Kilgarriff.
Minimum Edit Distance Definition of Minimum Edit Distance.
Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School.
1 Chapter 6 Dynamic Programming. 2 Algorithmic Paradigms Greedy. Build up a solution incrementally, optimizing some local criterion. Divide-and-conquer.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
N-gram Models CMSC Artificial Intelligence February 24, 2005.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
A * Search A* (pronounced "A star") is a best first, graph search algorithm that finds the least-cost path from a given initial node to one goal node out.
Weighted Minimum Edit Distance
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Natural Language Processing Statistical Inference: n-grams
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
2/29/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.
January 2012Spelling Models1 Human Language Technology Spelling Models.
Learning, Uncertainty, and Information: Evaluating Models Big Ideas November 12, 2004.
LING/C SC/PSYC 438/538 Lecture 24 Sandiway Fong 1.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Minimum Edit Distance Definition of Minimum Edit Distance.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Core String Edits, Alignments, and Dynamic Programming.
Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Spell checking. Spelling Correction and Edit Distance Non-word error detection: – detecting “graffe” “ سوژن ”, “ مصواک ”, “ مداا ” Non-word error correction:
Introduction to N-grams Language Modeling IP disclosure: Content borrowed from J&M 3 rd edition and Raymond Mooney.
Dynamic Programming for the Edit Distance Problem.
Spelling correction. Spell correction Two principal uses Correcting document(s) being indexed Retrieve matching documents when query contains a spelling.
Definition of Minimum Edit Distance
Decision Trees and Information: A Question of Bits
Definition of Minimum Edit Distance
Distance Functions for Sequence Data and Time Series
Single-Source All-Destinations Shortest Paths With Negative Costs
Computational Biology Lecture #6: Matching and Alignment
Single-Source All-Destinations Shortest Paths With Negative Costs
Computational Biology Lecture #6: Matching and Alignment
CSE (c) S. Tanimoto, 2001 Search-Introduction
Dynamic Programming-- Longest Common Subsequence
Bioinformatics Algorithms and Data Structures
CSCI 5582 Artificial Intelligence
Presentation transcript:

LIN3022 Natural Language Processing Lecture 4 Albert Gatt LIN Natural Language Processing

SPELL CHECKING AND EDIT DISTANCE Part 1 LIN Natural Language Processing

3 Sequence Comparison Once we have the kind of sequences we want, what kinds of simple things can we do? Compare sequences (determine similarity) – How close are a given pair of strings to each other? Alignment – What’s the best way to align the various bits and pieces of two sequences Edit distance – Minimum edit distance

4 Spelling Correction How do I fix “graffe”? – Search through all words in my lexicon graf craft grail giraffe – Pick the one that’s closest to graffe – What does “closest” mean? – We need a distance metric. – The simplest one: edit distance

5 Edit Distance The minimum edit distance between two strings is the minimum number of editing operations… – Insertion – Deletion – Substitution …needed to transform one string into the other

6 Minimum Edit Distance If each operation has cost of 1 Distance between these is 5 If substitutions cost 2 (Levenshtein) Distance between these is 8

7 Min Edit Example

8 Min Edit As Search We can view edit distance as a search for a path (a sequence of edits) that gets us from the start string to the final string – Initial state is the word we’re transforming – Operators are insert, delete, substitute – Goal state is the word we’re trying to get to – Path cost is what we’re trying to minimize: the number of edits

9 Min Edit as Search

10 Min Edit As Search But that generates a huge search space – Imagine checking every single possible path from the source word to the destination word. – We’d have a combinatorial explosion. Also, there will be lots of ways to get from source to destination. – But we’re only interested in the shortest one. –So there’s no need to keep track of the them all.

11 Defining Min Edit Distance For two strings: – S 1 of len n – S 2 of len m – distance(i,j) or D(i,j) means the edit distance of S 1 [1..i] and S 2 [1..j] i.e., the minimum number of edit operations need to transform the first i characters of S 1 into the first j characters of S 2 The edit distance of S 1, S 2 is D(n,m) We compute D(n,m) by computing D(i,j) for all i (0 < i < n) and j (0 < j < m)

12 Defining Min Edit Distance Base conditions: – D(i,0) = i (transforming a string of length i to a zero-length string involves i deletions) – D(0,j) = j (transforming a zero length string to a string of length j involves j insertions) – Recurrence Relation: D(i-1,j) + 1 (insertion) – D(i,j) = min D(i,j-1) + 1 (deletion) D(i-1,j-1) + 2; if S 1 (i) ≠ S 2 (j) (substitution) 0; if S 1 (i) = S 2 (j) (equality)

13 Dynamic Programming A tabular computation of D(n,m) Bottom-up – We compute D(i,j) for small i,j – And compute increase D(i,j) based on previously computed smaller values The essence of dynamic programming: – Break up the problem into small pieces – Solve the problem for the small bits. – Add the solutions up.

Initial steps Let n be the length of the target, m be the length of the source Create a matrix (table) with n+1 columns and m+1 rows. Initialise row 0, col 0 to D(0,0) = 0 LIN Natural Language Processing

15 N9 O8 I7 T6 N5 E4 T3 N2 I1 # #EXECUTION The Edit Distance Table

Next steps For each column for i = 1 to n do: – D(i,0) = D(i-1,0) + insert-cost(i) The cost at col i, row 0 is the cost of the previous column at this row + whatever the cost of inserting i is. For each column for j = 1 to m do: – D(0,j) = D(0,j-1) + delete-cost(j) The cost at col 0, row j is the cost at this row for the previous column + whatever the cost of deleting j is. LIN Natural Language Processing

17 N9 O8 I7 T6 N5 E4 T3 N2 I1 # #EXECUTION

Next steps For each column i from 1 to n do: For each row j from 1 to m do: set D(i,j) to be the minimum of: – The distance between the previous col and this row + the cost of inserting the current character in the target – The distance between the previous col and the previous row + the cost of substituting the current character in the source with that in the target – The distance between the current col and the previous row + the cost of deleting the current character from the source. LIN Natural Language Processing

19 N9 O8 I7 T6 N5 E4 T3 N2 I12 # #EXECUTION Compare i=1 to j = 1 Take the minimum of: D(1-1,1)+1 = D(#,I)+1= 2 (ins) D(1,1-1)+1 = D(E,#)+1 = 2 (del) D(i-1,j-1) + 2 = D(#,#) + 2 = 2 (subst) Min is 2

20 N9 O8 I7 T6 N5 E4 T3 N23 I12 # #EXECUTION Step 2: compare i=1 to j = 2 Take the minimum of: D(1-1,2)+1 = D(#,N)+1 = 3 (ins) D(1,1-1)+1 = D(E,I) + 1 = 3 (del) D(i-1,j-1) + 2 = D(#,I) + 2 = 4 (subst) Min is 3

21 N O I T N E T N I # #EXECUTION

22 Min Edit Distance Note that the result isn’t all that informative – For a pair of strings we get back a single number The min number of edits to get from here to there Like telling someone how far away their destination is, without giving them directions.

23 Alignment An alignment is a 1 to 1 pairing of each element in a sequence with a corresponding element in the other sequence or with a gap...

24 Paths/Alignments Keep a back pointer – Every time we fill a cell add a pointer back to the cell that was used to create it (the min cell that led to it) – To get the sequence of operations follow the backpointer from the final cell – That’s the same as the alignment.

25 N O I T N E T N I # #EXECUTION Backtrace

Uses for spellchecking Given a lexicon, and an input word to check, Min Edit gives us a way of finding an alternative which is the closest to the input word. If user types graffe, the closest word might be giraffe (edit cost of 1 insertion). LIN Natural Language Processing

AN ASIDE ABOUT CONTEXTUAL SPELL CHECKING Part 2 LIN Natural Language Processing

The simplest kind of spellchecker Lexicon [...] graph giraffe gaffe geometry [...] Input: graffe gaffe (1 deletion) giraffe (one insertion) The candidates offered to the user are just based on edit distance. The idea is that we minimise the distance from the solution to the user’s input. But sometimes we have ties.

A slight variation Lexicon [...] graph giraffe gaffe geometry [...] Input: graffe Gaffe (1 deletion) C(gaffe) = 200 giraffe (one insertion) C(giraffe) = 380 The candidates offered to the user still based on edit distance to minimise the distance from the solution to the user’s input. But if we have frequencies (or, better, probabilities), we can also nudge the user’s choice in a more likely direction.

An even nicer variation There are lots of spelling errors that aren’t “typos”: – Actual words, just not the intended words. – Sometimes called “brainos” How do we determine whether something is indeed a braino?

Contextual spelling correction

Anka l-iżbalji veri jiddependu mill- kuntest

How it works This kind of speller needs a probabilistic language model. – Needs to provide the probability of a sequence of characters. – Language is modelled as a series of transitions bertween characters.

Frod or Frodo? F->r->o->d->o->_->B->a->g->g-i->n->s versus F->r->o->d->_->B->a->g->g-i->n->s Think of each arrow as being “decorated” with the probability of going from the previous to the following character. LIN Natural Language Processing We expect the first sequence to be more probable than the second

Which means the model now works like this Lexicon [...] graph giraffe gaffe geometry [...] Input: I made a graffe last week in class Gaffe (1 deletion) C(gaffe) = 200 giraffe (one insertion) C(giraffe) = 380 We identify the closest existing words to the input word, but also combine character transition probabilities, to give us the more likely solution irrespective of its overall frequency.

Which means the model now works like this Lexicon [...] graph giraffe gaffe geometry [...] Input: I made apple desert for lunch Dessert (1 insertion) We identify the closest existing words to the input word, but also combine character transition probabilities, to give us the more likely solution irrespective of its overall frequency. This could also work with input words which aren’t typos, but make no sense in context.

INTRODUCTION TO LANGUAGE MODELS MORE GENERALLY Part 3 LIN Natural Language Processing

Teaser What’s the next word in: – Please turn your homework... – in? – out? – over? – ancillary? LIN Natural Language Processing

Example task The word or letter prediction task (Shannon game) Given: – a sequence of words (or letters) -- the history – a choice of next word (or letters) Predict: – the most likely next word (or letter)

Letter-based Language Models Shannon’s Game Guess the next letter:

Letter-based Language Models Shannon’s Game Guess the next letter: W

Letter-based Language Models Shannon’s Game Guess the next letter: Wh

Shannon’s Game Guess the next letter: Wha Letter-based Language Models

Shannon’s Game Guess the next letter: What Letter-based Language Models

Shannon’s Game Guess the next letter: What d Letter-based Language Models

Shannon’s Game Guess the next letter: What do Letter-based Language Models

Shannon’s Game Guess the next letter: What do you think the next letter is? Letter-based Language Models

Shannon’s Game Guess the next letter: What do you think the next letter is? Guess the next word: Letter-based Language Models

Shannon’s Game Guess the next letter: What do you think the next letter is? Guess the next word: What Letter-based Language Models

Shannon’s Game Guess the next letter: What do you think the next letter is? Guess the next word: What do Letter-based Language Models

Shannon’s Game Guess the next letter: What do you think the next letter is? Guess the next word: What do you Letter-based Language Models

Shannon’s Game Guess the next letter: What do you think the next letter is? Guess the next word: What do you think Letter-based Language Models

Shannon’s Game Guess the next letter: What do you think the next letter is? Guess the next word: What do you think the Letter-based Language Models

Shannon’s Game Guess the next letter: What do you think the next letter is? Guess the next word: What do you think the next Letter-based Language Models

Shannon’s Game Guess the next letter: What do you think the next letter is? Guess the next word: What do you think the next word Letter-based Language Models

Shannon’s Game Guess the next letter: What do you think the next letter is? Guess the next word: What do you think the next word is? Letter-based Language Models

Applications of the Shannon game Identifying spelling errors: – Basic idea: some letter sequences are more likely than others. Zero-order approximation – Every letter is equally likely. E.g. In English: P(e) = P(f) =... = P(z) = 1/26 – Assumes that all letters occur independently of the other and have equal frequency. » xfoml rxkhrjffjuj zlpwcwkcy ffjeyvkcqsghyd LIN Natural Language Processing

Applications of the Shannon game Identifying spelling errors: – Basic idea: some letter sequences are more likely than others. First-order approximation – Every letter has a probability dependent on its frequency (in some corpus). – Still assumes independence of letters from eachother. E.g. In English: – ocro hli rgwr nmielwis eu ll nbnesebya th eei alhenhtppa oobttva nah LIN Natural Language Processing

Applications of the Shannon game Identifying spelling errors: – Basic idea: some letter sequences are more likely than others. Second-order approximation – Every letter has a probability dependent on the previous letter. E.g. In English: on ie antsoutinys are t inctore st bes deamy achin d ilonasive tucoowe at teasonare fuzo tizin andy tobe seace ctisbe LIN Natural Language Processing

Applications of the Shannon game Identifying spelling errors: – Basic idea: some letter sequences are more likely than others. Third-order approximation – Every letter has a probability dependent on the previous two letter. E.g. In English: in no ist lat whey cratict froure birs grocid pondenome of demonstures of the reptagin is regoactiona of cre LIN Natural Language Processing

Applications of the Shannon Game Language identification: – Sequences of characters (or syllables) have different frequencies/probabilities in different languages. Higher frequency trigrams for different languages: – English: THE, ING, ENT, ION – German: EIN, ICH, DEN, DER – French: ENT, QUE, LES, ION – Italian:CHE, ERE, ZIO, DEL – Spanish:QUE, EST, ARA, ADO Languages in the same family tend to be more similar to each other than to languages in different families. LIN Natural Language Processing

Applications of the Shannon game with words Automatic speech recognition : – ASR systems get a noisy input signal and need to decode it to identify the words it corresponds to. – There could be many possible sequences of words corresponding to the input signal. Input: “He ate two apples” – He eight too apples – He ate too apples – He eight to apples – He ate two apples Which is the most probable sequence?

Applications of the Shannon Game with words Context-sensitive spelling correction: – Many spelling errors are real words He walked for miles in the dessert. (resp. desert) – Identifying such errors requires a global estimate of the probability of a sentence. LIN Natural Language Processing

N-gram models These are models that predict the next (n-th) word (or character) from a sequence of n-1 words (or characters). Simple example with bigrams and corpus frequencies: – he25 – he ate12 – he eight1 – ate to23 – ate too26 – ate two15 – eight to3 – two apples9 – to apples0 –... LIN Natural Language Processing Can use these to compute the probability of he eight to apples vs he ate two apples etc

N-gram models We’ll talk about n-gram models and markov assumptions in more detail next week... LIN Natural Language Processing