Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details.
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Statistical NLP: Lecture 11
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.
Albert Gatt Corpora and Statistical Methods Lecture 8.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Hidden Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
More about tagging, assignment 2 DAC723 Language Technology Leif Grönqvist 4. March, 2003.
Conditional Random Fields
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שישי Viterbi Tagging Syntax עידו.
Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Hidden Markov Models David Meir Blei November 1, 1999.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Language Modeling Approaches for Information Retrieval Rong Jin.
Natural Language Understanding
Albert Gatt Corpora and Statistical Methods Lecture 9.
Part-of-Speech Tagging
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Graphical models for part of speech tagging
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
7-Speech Recognition Speech Recognition Concepts
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,
Albert Gatt Corpora and Statistical Methods Lecture 10.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Leif Grönqvist 1 Tagging a Corpus of Spoken Swedish Leif Grönqvist Växjö University School of Mathematics and Systems Engineering
NLP. Introduction to NLP Sequence of random variables that aren’t independent Examples –weather reports –text.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Albert Gatt LIN3022 Natural Language Processing Lecture 7.
Albert Gatt Corpora and Statistical Methods. POS Tagging Assign each word in continuous text a tag indicating its part of speech. Essentially a classification.
CS Statistical Machine learning Lecture 24
Dongfang Xu School of Information
Stochastic and Rule Based Tagger for Nepali Language Krishna Sapkota Shailesh Pandey Prajol Shrestha nec & MPP.
Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
CSC 594 Topics in AI – Natural Language Processing
Hidden Markov Models - Training
Lecture 7 HMMs – the 3 Problems Forward Algorithm
CS4705 Natural Language Processing
Algorithms of POS Tagging
Part-of-Speech Tagging Using Hidden Markov Models
Presentation transcript:

Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10

2 Contents  Markov Model Taggers  Hidden Markov Model Taggers  Transformation-Based Learning of Tags  Tagging Accuracy and Uses of Taggers

3 Markov Model Taggers  Markov properties  Limited horizon  Time invariant cf. Wh-extraction (Chomsky) a. Should Peter buy a book? b. Which book should Peter buy?

4 Markov Model Taggers  The probabilistic model  Finding the best tagging t 1,n for a sentence w 1,n ex: P(AT NN BEZ IN AT VB | The bear is on the move)

5  assumtion words are independent of each other a word’s identity only depends on its tag

6 Markov Model Taggers  Training for all tags t j do for all tags t k do end for all tags t j do for all words w l do end

7 First tag Second tag ATBEZINNNVBPERIOD AT BEZ IN NN VB PERIOD ATBEZINNNVBPERIOD bear is move on president progress the

8 Markov Model Taggers  Tagging (the Viterbi algorithm)

9 Variations  The models for unknown words 1. assuming that they can be any part of speech 2. using morphological to make inferences about a possible parts of speech

10 Z: normalization constant

11 Variation  Trigram taggers  Interpolation  Variable Memory Markov Model (VMMM)

12 Variation  Smoothing  Reversibility K l : the number of possible parts of speech of w l

13 Variation  Sequence vs. tag by tag Time flies like an arrow. a. NN VBZ RB AT NN.P(.) = 0.01 b. NN NNS VB AT NN.P(.) = 0.01  there is no large difference in accuracy between maximizing the sequence and tag

14 Hidden Markov Model Taggers When we have no tagged training data  Initializing all parameters with the dictionary information  Jelinek’s method  Kupiec’s method

15 Hidden Markov Model Taggers  Jelinek’s method  initializing the HMM with the MLE for P(w k |t i )  assuming that words occur equally likely with each of their possible tags. T(w j ): the number of tags allowed for w j

16 Hidden Markov Model Taggers  Kupiec’s method  grouping all words with the same possible parts of speech into ‘metawords’ u L  not to fine-tune parameters for each word

17 Hidden Markov Model Taggers  Training  after initialization, the HMM is trained using the Forward-Backward algorithm  Tagging  equal to VMM ! the difference between VMM tagging and HMM tagging is in how we train the model, not in how we tag.

18 Hidden Markov Model Taggers  The effect of initialization on HMM  overtraining problem D0maximum likelihood estimates from a tagged training corpus D1correct ordering only of lexical probabilities D2lexical probabilities proportional to overall tag probabilities D3equal lexical probabilities for all tags admissible for a word T0maximum likelihood estimates from a tagged training corpus T1equal probabilities for all transitions

19  Use Visible Markov Model  a sufficiently large training text  similar to the intended text of application  Run Forward-Backward for a few iterations  no training text  training and test text are very different  but at least some lexical information  Run Forward-Backward for a larger number of iterations  no lexical information