Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Part-Of-Speech Tagging and Chunking using CRF & TBL
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili University of Tehran.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Advanced AI - Part II Luc De Raedt University of Freiburg WS 2004/2005 Many slides taken from Helmut Schmid.
Part II. Statistical NLP Advanced Artificial Intelligence Probabilistic Context Free Grammars Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Part II. Statistical NLP Advanced Artificial Intelligence Hidden Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Part-of-Speech (POS) tagging See Eric Brill “Part-of-speech tagging”. Chapter 17 of R Dale, H Moisl & H Somers (eds) Handbook of Natural Language Processing,
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Statistical Natural Language Processing Advanced AI - Part II Luc De Raedt University of Freiburg WS 2005/2006 Many slides taken from Helmut Schmid.
Word classes and part of speech tagging Chapter 5.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Part-of-Speech Tagging
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Some Advances in Transformation-Based Part of Speech Tagging
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Albert Gatt LIN3022 Natural Language Processing Lecture 7.
CSA3202 Human Language Technology HMMs for POS Tagging.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Hybrid Method for Tagging Arabic Text Written By: Yamina Tlili-Guiassa University Badji Mokhtar Annaba, Algeria Presented By: Ahmed Bukhamsin.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
Dr. Pushpak Bhattacharyya
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
CSCI 5832 Natural Language Processing
Machine Learning in Natural Language Processing
Classical Part of Speech (PoS) Tagging
Natural Language Processing
Presentation transcript:

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most slides taken (or adapted) from Adam Przepiorkowski (Poland) Figures by Manning and Schuetze

Contents  Part of Speech Tagging Task Why  Approaches Naïve VMM HMM Transformation Based Learning Parts of chapter 10 of Statistical NLP, Manning and Schuetze, and Chapter 8 of Jurafsky and Martin, Speech and Language Processing.

Motivations and Applications  Part-of-speech tagging The representative put chairs on the table AT NN VBD NNS IN AT NN AT JJ NN VBZ IN AT NN  Some tags : AT: article, NN: singular or mass noun, VBD: verb, past tense, NNS: plural noun, IN: preposition, JJ: adjective

Table 10.1

Why pos-tagging ?  First step in parsing  More tractable than full parsing, intermediate representation  Useful as a step for several other, more complex NLP tasks, e.g. Information extraction Word sense disambiguation Speech Synthesis  Oldest task in Statistical NLP  Easy to evaluate  Inherently sequential

Different approaches  Start from tagged training corpus And learn  Simplest approach For each word, predict the most frequent tag  0-th order Markov Model  Gets 90% accuracy at word level (English)  Best taggers 96-97% accuracy at word level (English) At sentence level : e.g. 20 words per sentence, on average one tagging error per sentence Unsure how much better one can do (human error)

Notation / Table 10.2

Visual Markov Model  Assume the VMM of last lecture  We are representing  Lexical (word) information implicit

Table 10.3

Hidden Markov Model  Make the lexical information explicit and use HMMs  State values correspond to possible tags  Observations to possible words  So, we have

Estimating the parameters  From a tagged corpus, maximum likelihood estimation  So, even though a hidden markov model is learning, everything is visible during learning !  Possibly apply smoothing (cf. N-gramms)

Table 10.4

Tagging with HMM  For an unknown sentence, employ now the Viterbi algorithm to tag  Similar techniques employed for protein secondary structure prediction  Problems The need for a large corpus Unknown words (cf. Zipf’s law)

Unknown words Two classes of part of speech : open and closed (e.g. articles) for closed classes all words are known Z: normalization constant

What if no corpus available ?  Use traditional HMM (Baum-Welch) but Assume dictionary (lexicon) that lists the possible tags for each word  One possibility : initialize the word generation (symbol emmision) probabilities

Transformation Based Learning (Eric Brill)  Observation : Predicting the most frequent tag already results in excellent behaviour  Why not try to correct the mistakes that are made ? Apply transformation rules  IF conditions THEN replace tag_j by tag_I  Which transformations / corrections admissible ?  How to learn these ?

Table 10.7/10.8

The learning algorithm

Remarks  Other machine learning methods could be applied as well (e.g. decision trees, rule learning …)

Rule-based tagging  Oldest method, hand-crafted rules  Start by assigning all potential tags to each word  Disambiguate using manually created rules  E.g. for the word that If  The next word is an adjective, an adverb or a quantifier,  And the further symbol is a sentence boundary  And the previous word is not a consider-type verb Then erase all tags apart from the adverbial tag Else erase the adverbial tag

Conclusions  Pos-tagging as an application of SNLP  VMM, HMMs, TBL  Statistical tagggers Good results for positional languages (English) Relatively cheap to build Overfitting avoidance needed Difficult to interpret (black box) Linguistically naïve

Conclusions  Rule-based taggers Very good results Expensive to build Presumably better for free word order languages Interpretable  Transformation based learning A good compromise ?