LING 438/538 Computational Linguistics Sandiway Fong Lecture 22: 11/9.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Outline Why part of speech tagging? Word classes
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams.
LINGUISTICA GENERALE E COMPUTAZIONALE DISAMBIGUAZIONE DELLE PARTI DEL DISCORSO.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
September PART-OF-SPEECH TAGGING Universita’ di Venezia 1 Ottobre 2003.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
LING 388: Language and Computers Sandiway Fong Lecture 23: 11/15.
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
1 I256: Applied Natural Language Processing Marti Hearst Sept 20, 2006.
POS Tagging HMM Taggers (continued). Today Walk through the guts of an HMM Tagger Address problems with HMM Taggers, specifically unknown words.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Part of speech (POS) tagging
1 PART-OF-SPEECH TAGGING. 2 Topics of the next three lectures Tagsets Rule-based tagging Brill tagger Tagging with Markov models The Viterbi algorithm.
LING 388 Language and Computers Lecture 23 12/2/03 Sandiway FONG.
CMSC 723 / LING 645: Intro to Computational Linguistics November 3, 2004 Lecture 9 (Dorr): Word Classes, POS Tagging (Chapter 8) Intro to Syntax (Start.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 18: 10/26.
Word classes and part of speech tagging Chapter 5.
Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
Albert Gatt Corpora and Statistical Methods Lecture 9.
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong. Administrivia Homework 4 – out today – due next Wednesday – (recommend you attempt it early) Reading.
Part-of-Speech Tagging
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
CS 4705 Hidden Markov Models Julia Hirschberg CS4705.
Natural Language Processing Lecture 8—2/5/2015 Susan W. Brown.
Lecture 6 POS Tagging Methods Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
CSA3202 Human Language Technology HMMs for POS Tagging.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
Part-of-speech tagging
Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.
LING/C SC/PSYC 438/538 Lecture 22 Sandiway Fong 1.
NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
1 COMP790: Statistical NLP POS Tagging Chap POS tagging Goal: assign the right part of speech (noun, verb, …) to words in a text “The/AT representative/NN.
6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Speech and Language Processing SLP Chapter 5. 10/31/1 2 Speech and Language Processing - Jurafsky and Martin 2 Today  Parts of speech (POS)  Tagsets.
Lecture 5 POS Tagging Methods
LING/C SC 581: Advanced Computational Linguistics
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
Part of Speech Tagging September 9, /12/2018.
Lecture 6: Part of Speech Tagging (II): October 14, 2004 Neal Snider
Presentation transcript:

LING 438/538 Computational Linguistics Sandiway Fong Lecture 22: 11/9

POS Tagging Task: –assign the right part-of-speech tag, e.g. noun, verb, conjunction, to a word in context POS taggers –need to be fast in order to process large corpora time taken should be no more than proportional to the size of the corpora –POS taggers try to assign the correct tag without actually (fully) parsing the sentence the walk : noun I took … I walk : verb 2 miles every day

How Hard is Tagging? Easy task to do well on: –naïve algorithm assign tag by (unigram) frequency –90% accuracy (Charniak et al., 1993) Brown Corpus (Francis & Kucera, 1982): –1 million words –39K distinct words –35K words with only 1 tag –4K with multiple tags (DeRose, 1988) That’s 89.7% from just considering single tag words, even without getting any multiple tag words right

Penn TreeBank Tagset A standard tagset (for English) –48-tag subset of the Brown Corpus tagset Simplifications : –Tag TO : infinitival marker, preposition I want to win I went to the store –Tag IN : preposition: that, when, although I know that I should have stopped, although… I stopped when I saw Bill

Penn TreeBank Tagset Simplifications: –Tag DT : determiner: any, some, these, those any man these *man/men –Tag VBP : verb, present: am, are, walk Am I here? *Walked I here?/Did I walk here?

Rule-Based POS Tagging ENGTWOL –English morphological analyzer based on two-level morphology (Chapter 3) –56K word stems –processing apply morphological engine get all possible tags for each word apply rules (1,100) to eliminate candidate tags

Rule-Based POS Tagging see section 8.4 ENGTWOL tagger (now ENGCG-2) – link seems down –

Rule-Based POS Tagging example in the textbook is: –Pavlov had shown that salivation … –… elided material is crucial

Rule-Based POS Tagging Examples of tags: –PCP2 past participle –SV subject verb –SVOO subject verb object object figure 8.8

Rule-Based POS Tagging example –it isn’t that:adv odd rule (from pg. 302) –given input “that” –if (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A) –then eliminate non-ADV tags –else eliminate ADV tag next word (+1) 2nd word (+2) previous word (-1): verb like consider cf. I consider that odd

Rule-Based POS Tagging Now ENGCG-2 (4000 rules) –don’t see demo online anymore.. –

Rule-Based POS Tagging Now ENGCG-2 (4000 rules) –

Rule-Based POS Tagging best claimed performance of all systems: 99.7% –no figures are mentioned in textbook statistical/ linguistic divide

Rule-Based POS Tagging

HMM POS Tagging from section 8.5 in general, HMM taggers maximize the quantity –p(word|tag) * p(tag|previous n tags) bigram HMM tagger –Let w i = ith word –and t i = tag for the ith word –Then t i = argmax j p(t j |t i-1,w i ) –Restate as: t i = argmax j p(t j |t i-1 ) * p(w i |t j )

HMM POS Tagging example –Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN –People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN tags (Penn) –NNP proper noun (sg) –NN noun (sg or mass) –NNS noun (pl) –VB verb (base) –VBZ verb (3rd pers, present) –VBP verb (not 3rd pers, present) –VBN verb (past participle) –DT determiner, IN preposition, JJ adjective, TO to

HMM POS Tagging 1st example –... to/TO race/?? –suppose race can have tag VB or NN only –formula indicates we should compare –p(VB|TO) * p(race|VB) –with p(NN|TO) * p(race|NN) –tag sequence probability * probability of word given selected tag tag sequence probability –p(NN|TO) = –p(VB|TO) = 0.34 –i.e. a verb is more than ten times as likely to follow TO as a noun lexical likelihood –p(race|NN) = –p(race|VB) = –i.e. race as a noun is more than ten times as frequent than as a verb calculation –p(VB|TO) * p(race|VB) = 0.34 * = –p(NN|TO) * p(race|NN) = * = (textbook says: ) –very close: choose to/TO race/VB bigram formula: t i = argmax j p(t j |t i-1 ) * p(w i |t j ) )

HMM POS Tagging given –word sequence W = w 1 w 2 … w n –let T = t 1 t 2 … t n be a tag sequence compute –T* = argmax T  p(T|W) –  = set of all possible tag sequences using Bayes Law –T* = argmax T  p(T)p(W|T)/p(W) –T* = argmax T  p(T)p(W|T)(p(W) a constant here) –T* = argmax T  p( t 1 t 2 … t n )p( w 1 w 2 … w n | t 1 t 2 … t n ) Chain Rule –p(t 1 t 2 t 3...t n ) = p(t 1 ) p(t 2 |t 1 ) p(t 3 |t 1 t 2 )... p(t n |t 1...t n-2 t n-1 ) –p(t 1 t 2 t 3...t n ) = p(t 1 ) p(t 2 |w 1 t 1 ) p(t 3 |w 1 t 1 w 2 t 2 )... p(t n |w 1 t 1... w n-2 t n-2 w n-1 t n-1 ) –p(w 1 w 2 w 3...w n |t 1 t 2 … t n ) = p(w 1 |t 1 ) p(w 2 |w 1 t 1 t 2 ) p(w 3 |w 1 t 1 w 2 t 2 t 3 )... p(w n |w 1 t 1...w n-2 t n-2 w n-1 t n-1 t n ) hence –T* = argmax T  p(t 1 ) p(w 1 |t 1 ) * p(t 2 |w 1 t 1 ) p(w 2 |w 1 t 1 t 2 ) *... * p(t n |w 1 t 1... w n-2 t n-2 w n-1 t n-1 ) p(w n |w 1 t 1...w n-2 t n-2 w n-1 t n-1 t n ) P(x|y) = P(y|x)P(x)/P(y) Math details: see section 8.5 (pgs.305–307)

HMM POS Tagging simplify –T* = argmax T  p(t 1 ) p(w 1 |t 1 ) * p(t 2 |w 1 t 1 ) p(w 2 |w 1 t 1 t 2 ) *... * p(t n |w 1 t 1... w n-2 t n-2 w n-1 t n-1 ) p(w n |w 1 t 1...w n-2 t n-2 w n-1 t n-1 t n ) assume –probability of a word is dependent only on its tag –i.e. p(w 1 |t 1 ) p(w 2 |w 1 t 1 t 2 )... p(w n |w 1 t 1...w n-2 t n-2 w n-1 t n-1 t n ) –becomes p(w 1 |t 1 ) p(w 2 |t 2 )... p(w n |t n ) assume –trigram approximation for tag history –i.e. p(t 1 ) p(t 2 |w 1 t 1 )... p(t n |w 1 t 1... w n-2 t n-2 w n-1 t n-1 ) –becomes p(t 1 ) p(t 2 |t 1 )... p(t n |t n-2 t n-1 ) formula becomes –T* = argmax T  p(t 1 ) p(t 2 |t 1 )... p(t n |t n-2 t n-1 ) * p(w 1 |t 1 ) p(w 2 |t 2 )... p(w n |t n )

HMM POS Tagging formula –T* = argmax T  p(t 1 ) p(t 2 |t 1 )... p(t n |t n-2 t n-1 ) * p(w 1 |t 1 ) p(w 2 |t 2 )... p(w n |t n ) corpus frequencies –p(t n |t n-2 t n-1 ) = f(t n-2 t n-1 t n ) / f(t n-2 t n-1 ) –p(w n |t n ) = f(w n,t n ) / f(t n ) assume –training corpus is tagged (manually) we can use –Viterbi (see chapter 7) to evaluate the formula for T* –smoothing (earlier lectures) to deal with zero frequencies in the training corpus results –> 96% (Weishedel et al., 1993), (DeRose, 1998) –baseline: naive unigram frequency algorithm 90% accuracy (Charniak et al., 1993) –rule-based tagger: ENGCG-2 (4000 rules) 99.7%

Transformation-Based POS Tagging (TBT) section 8.6 basic idea (Brill, 1995) –Tag Transformation Rules: change a tag to another tag by inspection of local context e.g. the tag before or after –initially use the naïve algorithm to assign tags –train a system to find these rules with a finite search space of possible rules error-driven procedure –repeat until errors are eliminated as far as possible –assume training corpus is already tagged –needed because of error-driven training procedure

TBT: Space of Possible Rules Fixed window around current tag: Prolog-based µ-TBL notation (Lager, 1999): –current tag > new tag <- –“change current tag to new tag if tag at position +/-N” t -3 t -2 t -1 t0t0 t1t1 t2t2 t3t3

TBT: Rules Learned Examples of rules learned (Manning & Schütze, 1999) (µ-TBL-style format): –NN > VB <- … to walk … –VBP > VB <- … could have put … –JJR > RBR <- … more valuable player … –VBP > VB <- … did n’t cut … (n’t is a separate word in the corpus) NN = noun, sg. or mass VB = verb, base form VBP = verb, pres. (¬3rd person) JJR = adjective, comparative RBR = adverb, comparative

The µ-TBL System Implements Transformation- Based Learning –Can be used for POS tagging as well as other applications Implemented in Prolog –code and data Downloadable from mutbl.html mutbl.html Full system for Windows (based on Sicstus Prolog) –Includes tagged Wall Street Journal corpora

The µ-TBL System Tagged Corpus (for training and evaluation) Format: –wd(P,W) P = index of W in corpus, W = word –tag(P,T) T = tag of word at index P –tag(T 1,T 2,P) T 1 = tag of word at index P, T 2 = correct tag (For efficient access: Prolog first argument indexing)

The µ-TBL System Example of tagged WSJ corpus: –wd(63,'Longer'). tag(63,'JJR'). tag('JJR','JJR',63). –wd(64,maturities). tag(64,'NNS'). tag('NNS','NNS',64). –wd(65,are). tag(65,'VBP'). tag('VBP','VBP',65). –wd(66,thought). tag(66,'VBN'). tag('VBN','VBN',66). –wd(67,to). tag(67,'TO'). tag('TO','TO',67). –wd(68,indicate). tag(68,'VBP'). tag('VBP','VB',68). –wd(69,declining). tag(69,'VBG'). tag('VBG','VBG',69). –wd(70,interest). tag(70,'NN'). tag('NN','NN',70). –wd(71,rates). tag(71,'NNS'). tag('NNS','NNS',71). –wd(72,because). tag(72,'IN'). tag('IN','IN',72). –wd(73,they). tag(73,'PP'). tag('PP','PP',73). –wd(74,permit). tag(74,'VB'). tag('VB','VBP',74). –wd(75,portfolio). tag(75,'NN'). tag('NN','NN',75). –wd(76,managers). tag(76,'NNS'). tag('NNS','NNS',76). –wd(77,to). tag(77,'TO'). tag('TO','TO',77). –wd(78,retain). tag(78,'VB'). tag('VB','VB',78). –wd(79,relatively). tag(79,'RB'). tag('RB','RB',79). –wd(80,higher). tag(80,'JJR'). tag('JJR','JJR',80). –wd(81,rates). tag(81,'NNS'). tag('NNS','NNS',81). –wd(82,for). tag(82,'IN'). tag('IN','IN',82). –wd(83,a). tag(83,'DT'). tag('DT','DT',83). –wd(84,longer). tag(84,'RB'). tag('RB','JJR',84).

The µ-TBL System

Recall –??? Precision –percentage of words that are tagged correctly F-score –combined weighted average of precision and recall –Equally weighted: 2*Precision*Recall/(Pre cison+Recall)

The µ-TBL System

see demo … –Off the webpage tag transformation rules are –human readable –more powerful than simple bigrams –take less “effort” to train

Next Time Chapter 9: Context-Free Grammars for English Also chapters for 538 presentations