Part of speech (POS) tagging

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Outline Why part of speech tagging? Word classes
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
LINGUISTICA GENERALE E COMPUTAZIONALE DISAMBIGUAZIONE DELLE PARTI DEL DISCORSO.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
POS Tagging & Chunking Sambhav Jain LTRC, IIIT Hyderabad.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Word Prediction Words do not randomly appear in text. The probability of a word appearing in a text is to a large degree related to the words that have.
POS Tagging Markov Models. POS Tagging Purpose: to give us explicit information about the structure of a text, and of the language itself, without necessarily.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
Learning Bit by Bit Hidden Markov Models. Weighted FSA weather The is outside
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Part-of-Speech (POS) tagging See Eric Brill “Part-of-speech tagging”. Chapter 17 of R Dale, H Moisl & H Somers (eds) Handbook of Natural Language Processing,
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
1 I256: Applied Natural Language Processing Marti Hearst Sept 20, 2006.
POS Tagging HMM Taggers (continued). Today Walk through the guts of an HMM Tagger Address problems with HMM Taggers, specifically unknown words.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Word classes and part of speech tagging Chapter 5.
Announcements Main CSE file server went down last night –Hand in your homework using ‘submit_cse467’ as soon as you can – no penalty if handed in today.
Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Part-of-Speech Tagging
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
CS 4705 Hidden Markov Models Julia Hirschberg CS4705.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
Albert Gatt LIN3022 Natural Language Processing Lecture 7.
CSA3202 Human Language Technology HMMs for POS Tagging.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Stochastic and Rule Based Tagger for Nepali Language Krishna Sapkota Shailesh Pandey Prajol Shrestha nec & MPP.
NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
1 COMP790: Statistical NLP POS Tagging Chap POS tagging Goal: assign the right part of speech (noun, verb, …) to words in a text “The/AT representative/NN.
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
CSCI 5832 Natural Language Processing
Lecture 6: Part of Speech Tagging (II): October 14, 2004 Neal Snider
Presentation transcript:

Part of speech (POS) tagging Tagging of words in a corpus with the correct part of speech, drawn from some tagset. Early automatic POS taggers were rule-based. Stochastic POS taggers are reasonably accurate.

Applications of POS tagging Parsing recovering syntactic structure requires correct POS tags partial parsing refers to and syntactic analysis which does not result in a full syntactic parse (e.g. finding noun phrases) - “parsing by chunks”

Applications of POS tagging Information extraction fill slots in predefined templates with information full parse is not needed for this task, but partial parsing results (phrases) can be very helpful information extraction tags with grammatical categories to find semantic categories

Applications of POS tagging Question answering system responds to a user question with a noun phrase Who shot JR? (Kristen Shepard) Where is Starbucks? (UB Commons) What is good to eat here? (pizza)

Background on POS tagging How hard is tagging? most words have just a single tag: easy some words have more than one possible tag: harder many common words are ambiguous Brown corpus: 10.4% of word types are ambiguous 40%+ of word tokens are ambiguous

Disambiguation approaches Rule-based rely on large set of rules to disambiguate in context rules are mostly hand-written Stochastic rely on probabilities of words having certain tags in context probabilities derived from training corpus Combined transformation-based tagger: uses stochastic approach to determine initial tagging, then uses a rule-based approach to “clean up” the tags

Determining the appropriate tag for an untagged word Two types of information can be used: syntagmatic information consider the tags of other words in the surrounding context tagger using such information correctly tagged approx. 77% of words problem: content words (which are the ones most likely to be ambiguous) typically have many parts of speech, via productive rules (e.g. N  V)

Determining the appropriate tag for an untagged word use information about word (e.g. usage probability) baseline for tagger performance is given by a tagger that simply assigns the most common tag to ambiguous words correctly tags 90% of words modern taggers use a variety of information sources

Note about accuracy measures Modern taggers claim accuracy rates of around 96% to 97%. This sounds impressive, but how good are they really? This is a measure of correctness at the level of individual words, not whole corpora. With a 96% accuracy, 1 word out of 25 is tagged incorrectly. This represents roughly one tagging error per sentence.

Rule-based POS tagging Two-stage design: first stage looks up individual words in a dictionary and tags words with sets of possible tags second stage uses rules to disambiguate, resulting in singleton tag sets

Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers generally maximize probabilities for tag sequences for sentences.

Bigram stochastic tagger This kind of tagger “…chooses tag ti for word wi that is most probable given the previous tag ti-1 and the current word wi: ti = argmaxj P(tj | ti-1, wi) (8.2)” [page 303] Bayes law says: P(T|W) = P(T)P(W|T)/P(W) P(tj | ti-1, wi) = P(tj) P(ti-1, wi | tj) / P(tI-1, wi) Since we take the argmax of this over the tis, result is the same as using: P(tj | ti-1, wi) = P(tj) P(ti-1, wi | tj) Rewriting: ti = argmaxj P(tj | ti-1)P(wi | tj)

Example (page 304) What tag to we assign to race? to/TO race/?? the/DT race/?? If we are choosing between NN and VB as tags for race, the equations are: P(VB|TO)P(race|VB) P(NN|TO)P(race|NN) Tagger will choose tag for NN which maximizes the probability

Example For first part – look at tag sequence probability: P(NN|TO) = 0.021 P(VB|TO) = 0.34 For second part – look at lexical likelihood: P(race|NN) = 0.00041 P(race|VB) = 0.00003 Combining these: P(VB|TO)P(race|VB) = 0.00001 P(NN|TO)P(race|NN) = 0.000007