Natural Language Processing

Slides:



Advertisements
Similar presentations
N-Grams Chapter 4 Part 1.
Advertisements

CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Natural Language Processing Lecture 3—9/3/2013 Jim Martin.
Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
1 Morphology September 4, 2012 Lecture #3. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
LIN3022 Natural Language Processing Lecture 3 Albert Gatt 1LIN3022 Natural Language Processing.
Stemming, tagging and chunking Text analysis short of parsing.
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Learning Bit by Bit Class 3 – Stemming and Tokenization.
Morphological analysis
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Introduction to English Morphology Finite State Transducers
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
Natural Language Processing Lecture 6—9/17/2013 Jim Martin.
Session 12 N-grams and Corpora Introduction to Speech and Natural Language Processing (KOM422 ) Credits: 3(3-0)
Morphology (CS ) By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 4 28 July 2005.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Session 11 Morphology and Finite State Transducers Introduction to Speech Natural and Language Processing (KOM422 ) Credits: 3(3-0)
Finite State Transducers for Morphological Parsing
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Natural Language Processing Lecture 2—1/15/2015 Susan W. Brown.
Artificial Intelligence: Natural Language
CSA3050: Natural Language Algorithms Finite State Devices.
Natural Language Processing Chapter 2 : Morphology.
October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.
1/11/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
Speech and Language Processing Lecture 4 Chapter 4 of SLP.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
CSC 594 Topics in AI – Natural Language Processing
CIS, Ludwig-Maximilians-Universität München Computational Morphology
CPSC 503 Computational Linguistics
Lecture 7 Summary Survey of English morphology
Speech and Language Processing
Grammar Workshop Thursday 9th June.
CPSC 503 Computational Linguistics
CSCI 5832 Natural Language Processing
CPSC 503 Computational Linguistics
CSCI 5832 Natural Language Processing
Speech and Language Processing
CSCI 5832 Natural Language Processing
Morphological Segmentation Inside-Out
Speech and Language Processing
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Dan Jurafsky 11/24/2018 LING 138/238 Autumn 2004.
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
CSC NLP - Regex, Finite State Automata
Algorithm Discovery and Design
Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Lecture 5 Scanning.
Morphological Parsing
CSCI 5832 Natural Language Processing
A Joint Model of Orthography and Morphological Segmentation
Presentation transcript:

Natural Language Processing Lecture 5—1/27/2015 Susan W. Brown

Today Big picture Review morphology Review finite state methods What do you need to know? What are finite state methods good for? Review morphology Review finite state methods How this fits with morphology Epsilon transitions Begin N-grams

Speech and Language Processing - Jurafsky and Martin Words Finite-state methods are particularly useful in dealing with large lexicons That is, big bunches of words Often infinite sized bunches Many devices, some with limited memory resources, need access to large lists of words And they need to perform fairly sophisticated tasks with those lists 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Word recognition/generation Recognize surface forms Spell checking Speech recognition Transform surface forms to more abstract repr Parsing (morphological, syntactic)  input to IR, MT or reasoning system Generate surface forms from abstract repr Summarization Question answering

FSA and FSTs formal definitions States Alphabets Transitions Closure Under union, inversion, composition Under intersection, complementation?

Speech and Language Processing - Jurafsky and Martin English Morphology Morphology is the study of the ways that words are built up from smaller units called morphemes The minimal meaning-bearing units in a language We can usefully divide morphemes into two classes Stems: The core meaning-bearing units Affixes: Bits and pieces that adhere to stems to change their meanings and grammatical functions 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin English Morphology We can further divide morphology up into two broad classes Inflectional Derivational 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Word Classes By word class, we have in mind familiar notions like noun and verb Also referred to as parts of speech and lexical categories We’ll go into the gory details in Chapter 5 Right now we’re concerned with word classes because the way that stems and affixes combine is based to a large degree on the word class of the stem 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Inflectional Morphology Inflectional morphology concerns the combination of stems and affixes where the resulting word.... Has the same word class as the original And serves a grammatical/semantic purpose that is Different from the original But is nevertheless transparently related to the original “walk” + “s” = “walks” 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Inflection in English Nouns are simple Markers for plural and possessive Verbs are only slightly more complex Markers appropriate to the tense of the verb That’s pretty much it Other languages can be quite a bit more complex An implication of this is that hacks (approaches) that work in English will not work for many other languages 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Regulars and Irregulars Things are complicated by the fact that some words misbehave (refuse to follow the rules) Mouse/mice, goose/geese, ox/oxen Go/went, fly/flew, catch/caught The terms regular and irregular are used to refer to words that follow the rules and those that don’t 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Regular and Irregular Verbs Regulars… Walk, walks, walking, walked, walked Irregulars Eat, eats, eating, ate, eaten Catch, catches, catching, caught, caught Cut, cuts, cutting, cut, cut 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Inflectional Morphology So inflectional morphology in English is fairly straightforward But is somewhat complicated by the fact that there are irregularities 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Derivational Morphology Derivational morphology is the messy stuff that no one ever taught you In English it is characterized by Quasi-systematicity Irregular meaning change Changes of word class 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Derivational Examples Verbs and Adjectives to Nouns -ation computerize computerization -ee appoint appointee -er kill killer -ness fuzzy fuzziness 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Derivational Examples Nouns and Verbs to Adjectives -al computation computational -able embrace embraceable -less clue clueless 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Example: Compute Many paths are possible… Start with compute Computer -> computerize -> computerization Computer -> computerize -> computerizable But not all paths/operations are equally good (allowable?) Clue Clue  clueless Clue  ?clueful Clue  *clueable 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Morphology and FSAs We would like to use the machinery provided by FSAs to capture these facts about morphology Accept strings that are in the language Reject strings that are not And do so in a way that doesn’t require us to in effect list all the forms of all the words in the language Even in English this is inefficient And in other languages it is impossible 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Start Simple Regular singular nouns are ok as is They are in the language Regular plural nouns have an -s on the end So they’re also in the language Irregulars are ok as is 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Simple Rules 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Now Plug in the Words Spelled Out Replace the class names like “reg-noun” with FSAs that recognize all the words in that class. 11/9/2018 Speech and Language Processing - Jurafsky and Martin

An epsilon digression Can always create an equivalent machine with no epsilon transitions Intuitive and convenient notation

Another epsilon example: Union (Or) 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Derivational Rules If everything is an accept state how do things ever get rejected? 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Lexicons So the big picture is to store a lexicon (list of words you care about) as an FSA. The base lexicon is embedded in larger automata that captures the inflectional and derivational morphology of the language. So what? Well, the simplest thing you can do with such an FSA is spell checking If the machine rejects, the word isn’t in the language Without listing every form of every word 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Parsing/Generation vs. Recognition We can now run strings through these machines to recognize strings in the language But recognition is usually not quite what we need Often if we find some string in the language we might like to assign a structure to it (parsing) Or we might start with some structure and want to produce a surface form for it (production/generation) For that we’ll move to finite state transducers Add a second tape that can be written to 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Finite State Transducers The simple story Add another tape Add extra symbols to the transitions On one tape we read “cats”, on the other we write “cat +N +PL” +N and +PL are elements in the alphabet for one tape that represent underlying linguistic features 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin FSTs 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin The Gory Details Of course, its not as easy as “cat +N +PL” <-> “cats” As we saw earlier there are geese, mice and oxen But there are also a whole host of spelling/pronunciation changes that go along with inflectional changes Cats vs Dogs (‘s’ sound vs. ‘z’ sound) Fox and Foxes (that ‘e’ got inserted) And doubling consonants (swim, swimming) adding k’s (picnic, picnicked) deleting e’s,... 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Multi-Tape Machines To deal with these complications, we will add even more tapes and use the output of one tape machine as the input to the next So, to handle irregular spelling changes we will add intermediate tapes with intermediate symbols 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Multi-Level Tape Machines We use one machine to transduce between the lexical and the intermediate level (M1), and another (M2) to handle the spelling changes to the surface tape M1 knows about the particulars of the lexicon M2 knows about weird English spelling rules 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Lexical to Intermediate Level 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Intermediate to Surface The add an “e” English spelling rule as in fox^s# <-> foxes# 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Foxes 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Foxes 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Foxes 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Note A key feature of this lower machine is that it has to do the right thing for inputs to which it doesn’t apply. So... fox^s#  foxes but bird^s#  birds and cat#  cat 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Cascading FSTs E-insertion rule Possessive rule: add ‘s or s’ We want to send all our words through all the FSTs cat+N +SG  cat cat+N +PL  cat^s Cat+N +SG +Poss  cat^’s Cat+N +PL +Poss  cat^s^’s FSTs closed under composition

FST determinism FSTs are not necessarily deterministic Sequential FSTs the search algorithms for NFTs are inefficient Sequential FSTs Deterministic More efficient No ambiguity P-subsequential Allows some ambiguity

HW 2 questions? (Homework 1 feedback on Thursday)

Speech and Language Processing - Jurafsky and Martin New Topic Statistical language modeling Chapter 4 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Word Prediction Guess the next word... So I notice three guys standing on the ??? What are some of the knowledge sources you used to come up with those predictions? 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Word Prediction We can formalize this task using what are called N-gram models N-grams are token sequences of length N -gram means “written” Our earlier example contains the following 2-grams (aka bigrams) (So I), (I notice), (notice three), (three guys), (guys standing), (standing on), (on the) Given knowledge of counts of N-grams such as these, we can guess likely next words in a sequence. 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin N-Gram Models More formally, we can use knowledge of the counts of N-grams to assess the conditional probability of candidate words as the next word in a sequence. Or, we can use them to assess the probability of an entire sequence of words. Pretty much the same thing as we’ll see... 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Applications It turns out that being able to predict the next word (or any linguistic unit) in a sequence is an extremely useful thing to be able to do. As we’ll see, it lies at the core of the following applications Automatic speech recognition Handwriting and character recognition Spelling correction Machine translation And many more 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Counting Simple counting lies at the core of any probabilistic approach. So let’s first take a look at what we’re counting. He stepped out into the hall, was delighted to encounter a water brother. 13 tokens, 15 if we include “,” and “.” as separate tokens. Assuming we include the comma and period as tokens, how many bigrams are there? 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Counting Not always that simple I do uh main- mainly business data processing Spoken language poses various challenges. Should we count “uh” and other fillers as tokens? What about the repetition of “mainly”? Should such do-overs count twice or just once? The answers depend on the application. If we’re focusing on something like ASR to support indexing for search, then “uh” isn’t helpful (it’s not likely to occur as a query). But filled pauses are very useful in dialog management, so we might want them there Tokenization of text raises the same kinds of issues 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Counting: Types and Tokens How about They picnicked by the pool, then lay back on the grass and looked at the stars. 18 tokens (again counting punctuation) But we might also note that “the” is used 3 times, so there are only 16 unique types (as opposed to tokens). In going forward, we’ll have occasion to focus on counting both types and tokens of both words and N-grams. 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Counting: Corpora What happens when we look at large bodies of text instead of single utterances Google Web Crawl Crawl of 1,024,908,267,229 English tokens in Web text 13,588,391 wordform types That seems like a lot of types... After all, even large dictionaries of English have only around 500k types. Why so many here? Numbers Misspellings Names Acronyms etc 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Language Modeling Now that we know how to count, back to word prediction We can model the word prediction task as the ability to assess the conditional probability of a word given the previous words in the sequence P(wn|w1,w2…wn-1) We’ll call a statistical model that can assess this a Language Model 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Language Modeling How might we go about calculating such a conditional probability? One way is to use the definition of conditional probabilities and look for counts. So to get P(the | its water is so transparent that) By definition that’s P(its water is so transparent that the) P(its water is so transparent that) We can get each of those from counts in a large corpus. 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Very Easy Estimate How to estimate? P(the | its water is so transparent that) P(the | its water is so transparent that) = Count(its water is so transparent that the) Count(its water is so transparent that) 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Very Easy Estimate According to Google those counts are 12000 and 19000 so the conditional probability of interest is... P(the | its water is so transparent that) = 0.63 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Language Modeling Unfortunately, for most sequences and for most text collections we won’t get good estimates from this method. What we’re likely to get is 0. Or worse 0/0. Clearly, we’ll have to be a little more clever. Let’s first use the chain rule of probability And then apply a particularly useful independence assumption 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin The Chain Rule Recall the definition of conditional probabilities Rewriting: For sequences... P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C) In general P(x1,x2,x3,…xn) = P(x1)P(x2|x1)P(x3|x1,x2)…P(xn|x1…xn-1) 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin The Chain Rule P(its water was so transparent)= P(its)* P(water|its)* P(was|its water)* P(so|its water was)* P(transparent|its water was so) 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Unfortunately There are still a lot of possible sequences in there In general, we’ll never be able to get enough data to compute the statistics for those longer prefixes Same problem we had for the strings themselves 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Independence Assumption Make the simplifying assumption P(lizard|the,other,day,I,was,walking,along,and,saw,a) = P(lizard|a) Or maybe P(lizard|the,other,day,I,was,walking,along,and,saw,a) = P(lizard|saw,a) That is, the probability in question is to some degree independent of its earlier history. 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Independence Assumption This particular kind of independence assumption is called a Markov assumption after the Russian mathematician Andrei Markov. 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin Markov Assumption So for each component in the product replace with the approximation (assuming a prefix of N - 1) Bigram version 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Estimating Bigram Probabilities The Maximum Likelihood Estimate (MLE) 11/9/2018 Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin An Example <s> I am Sam </s> <s> Sam I am </s> <s> I do not like green eggs and ham </s> 11/9/2018 Speech and Language Processing - Jurafsky and Martin