Part-of-Speech Tagging & Sequence Labeling Hongning Wang

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
BİL711 Natural Language Processing
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.
Albert Gatt Corpora and Statistical Methods Lecture 8.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
Applications of Sequence Learning CMPT 825 Mashaal A. Memon
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Part-of-Speech Tagging & Sequence Labeling
Chunking Pierre Bourreau Cristina España i Bonet LSI-UPC PLN-PTM.
Albert Gatt Corpora and Statistical Methods Lecture 9.
1 Sequence Labeling Raymond J. Mooney University of Texas at Austin.
ELN – Natural Language Processing Giuseppe Attardi
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Graphical models for part of speech tagging
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Ling 570 Day 17: Named Entity Recognition Chunking.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
CPSC 503 Computational Linguistics
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
Part-of-speech tagging
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
POS Tagger and Chunker for Tamil
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Introduction to NLP Thanks for Hongning slides on Text Ming Courses, Slides are slightly modified by Lei Chen.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Part-of-Speech Tagging CSE 628 Niranjan Balasubramanian Many slides and material from: Ray Mooney (UT Austin) Mausam (IIT Delhi) * * Mausam’s excellent.
KNN & Naïve Bayes Hongning Wang
Lecture 9: Part of Speech
Syntactic Category Prediction for Improving Translation Quality in English-Korean Machine Translation Sung-Dong Kim, Dept. of Computer Engineering, Hansung.
CSCI 5832 Natural Language Processing
PART OF SPEECH TAGGING (POS)
Natural Language Processing
CSCI 5832 Natural Language Processing
Part-of-Speech Tagging Using Hidden Markov Models
Presentation transcript:

Part-of-Speech Tagging & Sequence Labeling Hongning Wang

What is POS tagging Raw Text Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Pierre_NNP Vinken_NNP,_, 61_CD years_NNS old_JJ,_, will_MD join_VB the_DT board_NN as_IN a_DT nonexecutive_JJ director_NN Nov._NNP 29_CD._. Tagged Text POS Tagger 6501: Text Mining2 Tag Set NNP: proper noun CD: numeral JJ: adjective

Why POS tagging? POS tagging is a prerequisite for further NLP analysis – Syntax parsing Basic unit for parsing – Information extraction Indication of names, relations – Machine translation The meaning of a particular word depends on its POS tag – Sentiment analysis Adjectives are the major opinion holders – Good v.s. Bad, Excellent v.s. Terrible 6501: Text Mining3

Challenges in POS tagging Words often have more than one POS tag – The back door (adjective) – On my back (noun) – Promised to back the bill (verb) Simple solution with dictionary look-up does not work in practice – One needs to determine the POS tag for a particular instance of a word from its context 6501: Text Mining4

Define a tagset We have to agree on a standard inventory of word classes – Taggers are trained on a labeled corpora – The tagset needs to capture semantically or syntactically important distinctions that can easily be made by trained human annotators 6501: Text Mining5

Word classes Open classes – Nouns, verbs, adjectives, adverbs Closed classes – Auxiliaries and modal verbs – Prepositions, Conjunctions – Pronouns, Determiners – Particles, Numerals 6501: Text Mining6

Public tagsets in NLP Brown corpus - Francis and Kucera 1961 – 500 samples, distributed across 15 genres in rough proportion to the amount published in 1961 in each of those genres – 87 tags Penn Treebank - Marcus et al Penn Treebank – Hand-annotated corpus of Wall Street Journal, 1M words – 45 tags, a simplified version of Brown tag set – Standard for English now Most statistical POS taggers are trained on this Tagset 6501: Text Mining7

How much ambiguity is there? Statistics of word-tag pair in Brown Corpus and Penn Treebank 11%18% 6501: Text Mining8

Is POS tagging a solved problem? 6501: Text Mining9

Building a POS tagger Rule-based solution 1.Take a dictionary that lists all possible tags for each word 2.Assign to every word all its possible tags 3.Apply rules that eliminate impossible/unlikely tag sequences, leaving only one tag per word she PRP promised VBN,VBD to TO back VB, JJ, RB, NN!! the DT bill NN, VB R1: Pronoun should be followed by a past tense verb R2: Verb cannot follow determiner 6501: Text Mining10 Rules can be learned via inductive learning.

Building a POS tagger 6501: Text Mining11

POS tagging with generative models Bayes Rule – Joint distribution of tags and words – Generative model A stochastic process that first generates the tags, and then generates the words based on these tags 6501: Text Mining12

Hidden Markov models 6501: Text Mining13

Graphical representation of HMMs Light circle: latent random variables Dark circle: observed random variables Arrow: probabilistic dependency Transition probability Emission probability All the tags in the tagset All the words in the vocabulary 6501: Text Mining14

Finding the most probable tag sequence 6501: Text Mining15

6501: Text Mining16

Trellis: a special structure for HMMs Computation can be reused! 6501: Text Mining17

Viterbi algorithm The best i-1 tag sequence Generating the current observation Transition from the previous best ending tag 6501: Text Mining18

Viterbi algorithm 6501: Text Mining19 Order of computation

Take the highest scoring entry in the last column of the trellis Keep backpointers in each trellis to keep track of the most probable sequence 6501: Text Mining20

Train an HMMs tagger For the first tag in a sentence 6501: Text Mining21

Train an HMMs tagger 6501: Text Mining22 Proper smoothing is necessary!

Public POS taggers Brill’s tagger – TnT tagger – Stanford tagger – SVMTool – GENIA tagger – More complete list at – : Text Mining23

Let’s take a look at other NLP tasks Noun phrase (NP) chunking – Task: identify all non-recursive NP chunks 6501: Text Mining24

The BIO encoding Define three new tags – B-NP: beginning of a noun phrase chunk – I-NP: inside of a noun phrase chunk – O: outside of a noun phrase chunk POS Tagging with a restricted Tagset? 6501: Text Mining25

Another NLP task Shallow parsing – Task: identify all non-recursive NP, verb (“VP”) and preposition (“PP”) chunks 6501: Text Mining26

BIO Encoding for Shallow Parsing Define several new tags – B-NP B-VP B-PP: beginning of an “NP”, “VP”, “PP” chunk – I-NP I-VP I-PP: inside of an “NP”, “VP”, “PP” chunk – O: outside of any chunk 6501: Text Mining27 POS Tagging with a restricted Tagset?

Yet another NLP task Named Entity Recognition – Task: identify all mentions of named entities (people, organizations, locations, dates) 6501: Text Mining28

BIO Encoding for NER Define many new tags – B-PERS, B-DATE,…: beginning of a mention of a person/date... – I-PERS, B-DATE,…: inside of a mention of a person/date... – O: outside of any mention of a named entity 6501: Text Mining29 POS Tagging with a restricted Tagset?

Sequence labeling 6501: Text Mining30

Comparing to traditional classification problem Sequence labelingTraditional classification 6501: Text Mining31 yiyi xixi xjxj yjyj titi wiwi wjwj tjtj

Two modeling perspectives 6501: Text Mining32

Generative V.S. discriminative models Binary classification as an example 6501: Text Mining33 Generative Model’s viewDiscriminative Model’s view

Generative V.S. discriminative models GenerativeDiscriminative 6501: Text Mining34

Maximum entropy Markov models 6501: Text Mining35

Design features Emission-like features – Binary feature functions f first-letter-capitalized-NNP (China) = 1 f first-letter-capitalized-VB (know) = 0 – Integer (or real-valued) feature functions f number-of-vowels-NNP (China) = 2 Transition-like features – Binary feature functions f first-letter-capitalized-VB-NNP (China) = 1 Not necessarily independent features! VB China NNP 6501: Text Mining36 know

6501: Text Mining37

Parameterization of MEMMs 6501: Text Mining38

Parameter estimation Decompose the training data into such units 6501: Text Mining39

Why maximum entropy? We will explain this in detail when discussing the Logistic Regression models 6501: Text Mining40

A little bit more about MEMMs 6501: Text Mining41

Conditional random field 6501: Text Mining42

What you should know Definition of POS tagging problem – Property & challenges Public tag sets Generative model for POS tagging – HMMs General sequential labeling problem Discriminative model for sequential labeling – MEMMs 6501: Text Mining43

Today’s reading Speech and Language Processing – Chapter 5: Part-of-Speech Tagging – Chapter 6: Hidden Markov and Maximum Entropy Models – Chapter 22: Information Extraction (optional) 6501: Text Mining44