Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.

Slides:



Advertisements
Similar presentations
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Advertisements

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Outline Why part of speech tagging? Word classes
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams.
LINGUISTICA GENERALE E COMPUTAZIONALE DISAMBIGUAZIONE DELLE PARTI DEL DISCORSO.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
Hidden Markov Models IP notice: slides from Dan Jurafsky.
Hidden Markov Models IP notice: slides from Dan Jurafsky.
September PART-OF-SPEECH TAGGING Universita’ di Venezia 1 Ottobre 2003.
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Fall 2005 Lecture Notes #2 EECS 595 / LING 541 / SI 661&761 Natural Language Processing.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 22: 11/9.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
Dragomir Radev Wrocław, Poland July 29, 2009 Computational Linguistics.
Learning Bit by Bit Hidden Markov Models. Weighted FSA weather The is outside
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
1 I256: Applied Natural Language Processing Marti Hearst Sept 20, 2006.
POS Tagging HMM Taggers (continued). Today Walk through the guts of an HMM Tagger Address problems with HMM Taggers, specifically unknown words.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Part of speech (POS) tagging
1 PART-OF-SPEECH TAGGING. 2 Topics of the next three lectures Tagsets Rule-based tagging Brill tagger Tagging with Markov models The Viterbi algorithm.
CMSC 723 / LING 645: Intro to Computational Linguistics November 3, 2004 Lecture 9 (Dorr): Word Classes, POS Tagging (Chapter 8) Intro to Syntax (Start.
Word classes and part of speech tagging Chapter 5.
Announcements Main CSE file server went down last night –Hand in your homework using ‘submit_cse467’ as soon as you can – no penalty if handed in today.
Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.
Albert Gatt Corpora and Statistical Methods Lecture 9.
M ARKOV M ODELS & POS T AGGING Nazife Dimililer 23/10/2012.
Part-of-Speech Tagging
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
CS 4705 Hidden Markov Models Julia Hirschberg CS4705.
Natural Language Processing Lecture 8—2/5/2015 Susan W. Brown.
Lecture 6 POS Tagging Methods Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
1 LIN 6932 Spring 2007 LIN6932: Topics in Computational Linguistics Hana Filip Lecture 4: Part of Speech Tagging (II) - Introduction to Probability February.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.
NLP. Introduction to NLP Example: –Input: Written English (X) –Encoder: garbles the input (X->Y) –Output: Spoken English (Y) More examples: –Grammatical.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
CSA3202 Human Language Technology HMMs for POS Tagging.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
1 COMP790: Statistical NLP POS Tagging Chap POS tagging Goal: assign the right part of speech (noun, verb, …) to words in a text “The/AT representative/NN.
Speech and Language Processing SLP Chapter 5. 10/31/1 2 Speech and Language Processing - Jurafsky and Martin 2 Today  Parts of speech (POS)  Tagsets.
CSC 594 Topics in AI – Natural Language Processing
Lecture 5 POS Tagging Methods
Word classes and part of speech tagging
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
CSC 594 Topics in AI – Natural Language Processing
Hidden Markov Models IP notice: slides from Dan Jurafsky.
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
Part of Speech Tagging September 9, /12/2018.
CSCI 5832 Natural Language Processing
Lecture 6: Part of Speech Tagging (II): October 14, 2004 Neal Snider
Natural Language Processing
Part-of-Speech Tagging Using Hidden Markov Models
Presentation transcript:

Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing

Evaluation of NLP systems

The classical pipeline (for supervised learning) Training set/dev set/test set Dumb baseline Intelligent baseline Your algorithm Human ceiling Accuracy/precision/recall Multiple references Statistical significance

Special cases Document retrieval systems Part of speech tagging Parsing –Labeled recall –Labeled precision –Crossing brackets

Word classes and part-of-speech tagging

Part of speech tagging Problems: transport, object, discount, address More problems: content French: est, président, fils “Book that flight” – what is the part of speech associated with “book”? POS tagging: assigning parts of speech to words in a text. Three main techniques: rule-based tagging, stochastic tagging, transformation-based tagging

Rule-based POS tagging Use dictionary or FST to find all possible parts of speech Use disambiguation rules (e.g., ART+V) Typically hundreds of constraints can be designed manually

Example in French ^ beginning of sentence La rf b nms u article teneur nfs nms noun feminine singular Moyenne jfs nfs v1s v2s v3s adjective feminine singular en p a b preposition uranium nms noun masculine singular des p r preposition rivi`eres nfp noun feminine plural, x punctuation bien_que cs subordinating conjunction délicate jfs adjective feminine singular À p preposition calculer v verb

Sample rules BS3 BI1: A BS3 (3rd person subject personal pronoun) cannot be followed by a BI1 (1st person indirect personal pronoun). In the example: ``il nous faut'' ({\it we need}) - ``il'' has the tag BS3MS and ``nous'' has the tags [BD1P BI1P BJ1P BR1P BS1P]. The negative constraint ``BS3 BI1'' rules out ``BI1P'', and thus leaves only 4 alternatives for the word ``nous''. N K: The tag N (noun) cannot be followed by a tag K (interrogative pronoun); an example in the test corpus would be: ``... fleuve qui...'' (...river, that...). Since ``qui'' can be tagged both as an ``E'' (relative pronoun) and a ``K'' (interrogative pronoun), the ``E'' will be chosen by the tagger since an interrogative pronoun cannot follow a noun (``N''). R V:A word tagged with R (article) cannot be followed by a word tagged with V (verb): for example ``l' appelle'' (calls him/her). The word ``appelle'' can only be a verb, but ``l''' can be either an article or a personal pronoun. Thus, the rule will eliminate the article tag, giving preference to the pronoun.

Confusion matrix INJJNNNNPRBVBDVBN IN-.2.7 JJ NN NNP RB VBD VBN Most confusing: NN vs. NNP vs. JJ, VBD vs. VBN vs. JJ

HMM Tagging T = argmax P(T|W), where T=t 1,t 2,…,t n By Bayes’s theorem: P(T|W) = P(T)P(W|T)/P(W) Thus we are attempting to choose the sequence of tags that maximizes the rhs of the equation P(W) can be ignored P(T)P(W|T) = ? P(T) is called the prior, P(W|T) is called the likelihood.

HMM tagging (cont’d) P(T)P(W|T) = P(w i |w 1 t 1 …w i-1 t i-1 t i )P(t i |t 1 …t i-2 t i-1 ) Simplification 1: P(W|T) = P(w i |t i ) Simplification 2: P(T)= P(t i |t i-1 ) T = argmax P(T|W) = argmax P(w i |t i ) P(t i |t i-1 )    

Estimates P(NN|DT) = C(DT,NN)/C(DT)=56509/ =.49 P(is|VBZ = C(VBZ,is)/C(VBZ)=10073/21627=.47

Example Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NR People/NNS continue/VBP to/TO inquire/VB the/AT reason/NN for/IN the/AT race/NN for/IN outer/JJ space/NN TO: to+VB (to sleep), to+NN (to school)

Example NNPVBZVBNTO VB NR Secretariatisexpectedracetomorrowto NNPVBZVBNTO NN NR Secretariatisexpectedracetomorrowto

Example (cont’d) P(NN|TO) = P(VB|TO) =.83 P(race|NN) = P(race|VB) = P(NR|VB) =.0027 P(NR|NN) =.0012 P(VB|TO)P(NR|VB)P(race|VB) = P(NN|TO)P(NR|NN)P(race|NN) =

Decoding Finding what sequence of states is the source of a sequence of observations Viterbi decoding (dynamic programming) – finding the optimal sequence of tags Input: HMM and sequence of words, output: sequence of states

Transformation-based learning P(NN|race) =.98 P(VB|race) =.02 Change NN to VB when the previous tag is TO Types of rules: –The preceding (following) word is tagged z –The word two before (after) is tagged z –One of the two preceding (following) words is tagged z –One of the three preceding (following) words is tagged z –The preceding word is tagged z and the following word is tagged w