LING 438/538 Computational Linguistics Sandiway Fong Lecture 20: 11/2.

Slides:



Advertisements
Similar presentations
Language and Grammar Unit
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
Sentence Structure By: Lisa Crawford, Edited by: UWC staff
1 Words and the Lexicon September 10th 2009 Lecture #3.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Stemming, tagging and chunking Text analysis short of parsing.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
LING 388: Language and Computers Sandiway Fong Lecture 23: 11/15.
NLP and Speech 2004 English Grammar
LING 438/538 Computational Linguistics
1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Part of speech (POS) tagging
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Matakuliah: G0922/Introduction to Linguistics Tahun: 2008 Session 10 Syntax 1.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 17: 10/24.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 18: 10/26.
Chapter Section A: Verb Basics Section B: Pronoun Basics Section C: Parallel Structure Section D: Using Modifiers Effectively The Writer’s Handbook: Grammar.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
Albert Gatt Corpora and Statistical Methods Lecture 9.
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
1 Advanced Smoothing, Evaluation of Language Models.
8/27/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong. Administrivia Homework 4 – out today – due next Wednesday – (recommend you attempt it early) Reading.
Part-of-Speech Tagging
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
1 CPE 641 Natural Language Processing Lecture 2: Levels of Linguistic Analysis, Tokenization & Part- of-speech Tagging Asst. Prof. Dr. Nuttanart Facundes.
Final Review.  Consists of 60 Multiple Choice Questions  Skills include:  Reading Comprehension  Commonly Confused Words  Subject-Verb Agreement.
SLOW DOWN!!!  Remember… the easiest way to make your score go up is to slow down and miss fewer questions  You’re scored on total points, not the percentage.
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
ASPECTS OF LINGUISTIC COMPETENCE 4 SEPT 09, 2013 – DAY 6 Brain & Language LING NSCI Harry Howard Tulane University.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 11.
Chapter 6: N-GRAMS Heshaam Faili University of Tehran.
NLP. Introduction to NLP Extrinsic –Use in an application Intrinsic –Cheaper Correlate the two for validation purposes.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Parts of Speech A Brief Review. Noun Person, Place, Thing, or Idea Common: begins with lower case letter (city) Proper: begins with capital letter (Detroit)
Word classes and part of speech tagging Chapter 5.
Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Chapter 23: Probabilistic Language Models April 13, 2004.
Natural Language Processing
Parts of Speech A Brief Review. Noun Person, Place, Thing, or Idea Common: begins with lower case letter (city) Proper: begins with capital letter (Detroit)
Estimating N-gram Probabilities Language Modeling.
Part-of-speech tagging
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
IF YOU KNOW THIS, YOU KNOW GRAMMAR Parts of Speech.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Basic Syntactic Structures of English CSCI-GA.2590 – Lecture 2B Ralph Grishman NYU.
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
Lecture 9: Part of Speech
If you know this, you know grammar
Word classes and linguistic terms
Word Classes and Linguistic Terms
CSCI 5832 Natural Language Processing
N-Gram Model Formulas Word sequences Chain rule of probability
Natural Language Processing
CPSC 503 Computational Linguistics
Presentation transcript:

LING 438/538 Computational Linguistics Sandiway Fong Lecture 20: 11/2

Today’s Topics Conclude the n-gram section –(Chapter 6) Start Part-of-Speech (POS) Tagging section –(Chapter 8)

N-grams for Spelling Correction Spelling Errors –see Chapter 5 (Kukich, 1992): –Non-word detection (easiest) graffe (giraffe) –Isolated-word (context-free) error correction graffe (giraffe,…) graffed (gaffed,…) by definition cannot correct when error word is a valid word –Context-dependent error detection and correction (hardest) your an idiot  you’re an idiot (Microsoft Word corrects this by default)

N-grams for Spelling Correction Context-sensitive spelling correction –real-word error detection when the mistyped word happens to be a real word 15% Peterson (1986) 25%-40% Kukich (1992) –examples (local) was conducted mainly be John Black leaving in about fifteen minuets all with continue smoothly –examples (global) Won’t they heave if next Monday at that time? the system has been operating system with all four units on-line

N-grams for Spelling Correction Given a word sequence: –W = w 1 … w k … w n Suppose w k is mispelled Suppose possible misspellings are w 1 k,, w 2 k etc. w 1 k,, w 2 k etc. can be estimated via edit distance operations Find the most likely sequence –w 1 … w n k … w n –i.e. maximize p(w 1 … w n k … w n ) Chain Rule –p(w 1 w 2 w 3...w n ) = p(w 1 ) p(w 2 |w 1 ) p(w 3 |w 1 w 2 )... p(w n |w 1...w n-2 w n-1 )

N-grams for Spelling Correction Use an n-gram language model for P(W) Bigram –p(w 1 w 2 w 3...w n ) = p(w 1 ) p(w 2 |w 1 ) p(w 3 |w 1 w 2 )...p(w n |w 1...w n-3 w n-2 w n-1 ) –p(w 1 w 2 w 3...w n )  p(w 1 ) p(w 2 |w 1 ) p(w 3 |w 2 )...p(w n |w n-1 ) Trigram –p(w 1 w 2 w 3 w 4...w n ) = p(w 1 ) p(w 2 |w 1 ) p(w 3 |w 1 w 2 ) p(w 4 |w 1 w 2 w 3 )...p(w n |w 1...w n-3 w n-2 w n- 1 ) –p(w 1 w 2 w 3...w n )  p(w 1 ) p(w 2 |w 1 ) p(w 3 |w 1 w 2 )p(w 4 |w 2 w 3 )...p(w n |w n-2 w n- 1 ) Problem: –we need to estimate n-grams containing the “mispelled” items –where to find data? –acute sparse data problem Possible Solution (“class-based n-grams”) –Use a part-of-speech n-gram model –more data –p(c 1 c 2 c 3...c n )  p(c 1 ) p(c 2 |c 1 ) p(c 3 |c 2 )...p(c n |c n-1 ) (bigram) –c i = category label (N,V,A,P,Adv,Conj, etc.)

Entropy Uncertainty measure (Shannon) –given a random variable x r =2, p i = probability the event is i –Biased coin: -0.8 * lg * lg 0.2 = = –Unbiased coin: - 2* 0.5 * lg 0.5 = 1 –lg = log2 (log base 2) –entropy = H(x) = Shannon uncertainty Perplexity –(average) branching factor –weighted average number of choices a random variable has to make –Formula: 2 H directly related to the entropy value H Examples –Biased coin: = 0.52 –Unbiased coin: = 2

Entropy and Word Sequences given a word sequence: –W = w 1 … w n Entropy for word sequences of length n in language L –H( w 1 … w n ) = -  p( w 1 … w n ) log p( w 1 … w n ) –over all sequences of length n in language L Entropy rate for word sequences of length n –1/n H( w 1 … w n ) –= -1/n  p( w 1 … w n ) log p( w 1 … w n ) Entropy rate –H(L) = lim n>  -1/n  p( w 1 … w n ) log p( w 1 … w n ) –n is number of words in the sequence Shannon-McMillan-Breiman theorem –H(L) = lim n→  - 1/n log p( w 1 … w n ) –select sufficiently large n –possible then to take a single sequence instead of summing over all possible w 1 … w n long sequence will contain many shorter sequences

Cross-Entropy evaluate competing models compare two probabilistic models, i.e. approximations, m 1 and p Compute the cross- entropy of m i on p: –H(p, m i ) using –H(p, m i ) = lim n →  1/n -  p( w 1 … w n ) log m i ( w 1 … w n ) Shannon-McMillan-Breiman version: –H(p, m i ) = lim n →  -1/n log m i ( w 1 … w n ) H(p)  H(p, m i ) –true entropy is a lower bound m i with lowest H(p, m i ) is the more accurate model

Perplexity of Language Models [see p228: section 6.7] n-gram models corpus: –38 million words from the WSJ (Wall Street Journal) Compute perplexity of each model on a test set of 1.5 million words Perplexity defined as –2 H(p, mi) Results: –unigram 962 –bigram 170 –trigram 109 –the lower the perplexity, the more closely the trained model follows the data

Entropy of English Shannon Experiment –given a (hidden) sequence of characters –ask speaker of language to predict what the next character might be –record the number of guesses taken to get the right character –H( English ) = -1/n  p( guess = character ) log p( guess = character ) guess ranges over all characters (letters and space) n is 27

Entropy of English Shannon Experiment –1.3 bits (recorded) –URL: –“We should also mention that in a classroom of about 60 students, with everybody venturing guesses for each next letter, we consistently obtained a value of about 1.6 bits for the estimate of the entropy.”

Entropy of English Word-based method –train a very good stochastic model m of English on a large corpus –Use it to assign a log-probability to a very long sequence Shannon-McMillan-Breiman formula: –H(p, m) = lim n →  -1/n log m( w 1 … w n ) –H( English )  H(p, m) –Result: 1.75 bits per character (Brown et al.) 583 million words corpus to train model m test sequence was the Brown corpus (1 million words)

Next Topic Chapter 8: –Word Classes and Part-of-Speech Tagging

Parts-of-Speech Divide words into classes based on grammatical function –nouns (open-class: unlimited set) referential items (denoting objects/concepts etc.) –proper nouns: John –pronouns: he, him, she, her, it –anaphors: himself, herself (reflexives) –common nouns: dog, dogs, water »number: dog (singular), dogs (plural) »count-mass distinction: many dogs, *many waters –eventive nouns: dismissal, concert, playback, destruction (deverbal) nonreferential items –it as in it is important to study –there as in there seems to be a problem –some languages don’t have these: e.g. Japanese open-class –factoid, , bush-ism

Parts-of-Speech Pronouns: 1.it 2.I 3.he 4.you 5.his 6.they 7.this 8.that 9.she 10.her 11.we 12.all 13.which 14.their 15.what figure 8.4

Parts-of-Speech Divide words into classes based on grammatical function –verbs (closed-class: fixed set) auxiliaries –be(passive, progressive) –have (pluperfect tense) –do(what did John buy?, Did Mary win?) –modals: can, could, would, will, may Irregular: –is, was, were, does, did figure 8.5

Parts-of-Speech Divide words into classes based on grammatical function –verbs (open-class: unlimited set) Intransitive –unaccusatives: arrive (achievement) –unergatives: run, jog (activities) Transitive –actions: hit (semelfactive: hit the ball for an hour) –actions: eat, destroy (accomplishment) –psych verbs: frighten (x frightens y), fear (y fears x) Ditransitive –put (x put y on z, *x put y) –give (x gave y z, *x gave y, x gave z to y) –load (x loaded y (on z), x loaded z (with y)) –Open-class: reaganize, , fax

Parts-of-Speech Divide words into classes based on grammatical function –adjectives (open-class: unlimited set) modify nouns black, white, open, closed, sick, well attributive: black (black car, car is black), main (main street, *street is main), atomic predicative: afraid (*afraid child, the child is afraid) stage-level: drunk (there is a man drunk in the pub) individual-level: clever, short, tall (*there is a man tall in the bar) object-taking: proud (proud of him,*well of him) intersective: red (red car: intersection of the set of red things and the set of cars) non-intersective: former (former architect), atomic (atomic scientist) comparative, superlative: blacker, blackest, *opener, *openest –open-class: hackable, spammable

Parts-of-Speech Divide words into classes based on grammatical function –adverbs (open-class: unlimited set) modify verbs (adjectives and other adverbs) manner: slowly (moved slowly) degree: slightly, more (more clearly), very (very bad), almost sentential: unfortunately, suddenly question: how temporal: when, soon, yesterday (noun?) location: sideways, here (John is here) –open-class: spam-wise

Parts-of-Speech Divide words into classes based on grammatical function –prepositions (closed-class: fixed set) –come before an object, assigns a semantic function (from Mars, *Mars from) head-final languages: postpositions (Japanese: amerika-kara) –location: on, in, by –temporal: by, until figure 8.1

Parts-of-Speech Divide words into classes based on grammatical function –particles (closed-class: fixed set) –resembles a preposition or adverb, often combines to form a phrasal verb –went on, finish up –throw sleep off (throw off sleep) single-word particles (Quirk, 1985): figure 8.2

Parts-of-Speech Divide words into classes based on grammatical function –conjunctions (closed-class: fixed set) –used to join two phrases, clauses or sentences –coordinating conjunctions: and, or, but –subordinating conjunctions: that (complementizer) figure 8.3

Part-of-Speech (POS) Tagging Idea: –assign the right part-of-speech tag, e.g. noun, verb, conjunction, to a word –useful for shallow parsing –or as first stage of a deeper/more sophisticated system Question: –Is it a hard task? i.e. can’t we look the words up in a dictionary? Answer: –Yes. Ambiguity. –No. POS Taggers typical claim 95%+ accuracy

Part-of-Speech (POS) Tagging example –walk: noun, verb the walk : noun I took … I walk : verb 2 miles every day –as a shallow parsing tool: can we do this without fully parsing the sentence? example –still: noun, adjective, adverb, verb the still of the night, a glass still still waters stand still still struggling Still, I didn’t give way still your fear of the dark (transitive) the bubbling waters stilled (intransitive)

POS Tagging Task: –assign the right part-of-speech tag, e.g. noun, verb, conjunction, to a word in context POS taggers –need to be fast in order to process large corpora should take no more than time linear in the size of the corpora –full parsing is slow e.g. context-free grammar  n 3, n length of the sentence –POS taggers try to assign correct tag without actually parsing the sentence

POS Tagging Components: –Dictionary of words Exhaustive list of closed class items –Examples: »the, a, an: determiner »from, to, of, by: preposition »and, or: coordination conjunction Large set of open class (e.g. noun, verbs, adjectives) items with frequency information

POS Tagging Components: –Mechanism to assign tags Context-free: by frequency Context: bigram, trigram, HMM, hand-coded rules –Example: »Det Noun/*Verb the walk… –Mechanism to handle unknown words (extra-dictionary) Capitalization Morphology: -ed, -tion

How Hard is Tagging? Brown Corpus (Francis & Kucera, 1982): –1 million words –39K distinct words –35K words with only 1 tag –4K with multiple tags (DeRose, 1988) figure 8.7

How Hard is Tagging? Easy task to do well on: –naïve algorithm assign tag by (unigram) frequency –90% accuracy (Charniak et al., 1993)

Penn TreeBank Tagset 48-tag simplification of Brown Corpus tagset Examples: 1.CCCoordinating conjunction 3.DTDeterminer 7.JJAdjective 11.MDModal 12.NNNoun (singular,mass) 13.NNSNoun (plural) 27VBVerb (base form) 28VBDVerb (past)

Penn TreeBank Tagset

Penn TreeBank Tagset

Penn TreeBank Tagset How many tags? –Tag criterion Distinctness with respect to grammatical behavior? –Make tagging easier? Punctuation tags –Penn Treebank numbers Trivial computational task

Penn TreeBank Tagset Simplifications : –Tag TO : infinitival marker, preposition I want to win I went to the store –Tag IN : preposition: that, when, although I know that I should have stopped, although… I stopped when I saw Bill

Penn TreeBank Tagset Simplifications: –Tag DT : determiner: any, some, these, those any man these *man/men –Tag VBP : verb, present: am, are, walk Am I here? *Walked I here?/Did I walk here?

Hard to Tag Items Syntactic Function –Example: resultative I saw the man tired from running Examples (from Brown Corpus Manual) –Hyphenation: long-range, high-energy shirt-sleeved signal-to-noise –Foreign words: mens sana in corpore sano