Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part-of-speech tagging

Similar presentations


Presentation on theme: "Part-of-speech tagging"— Presentation transcript:

1 Part-of-speech tagging
A simple but useful form of linguistic analysis Christopher Manning

2 Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE), there was the idea of having parts of speech a.k.a lexical categories, word classes, “tags”, POS It comes from Dionysius Thrax of Alexandria (c. 100 BCE) the idea that is still with us that there are 8 parts of speech in Greek But actually his 8 aren’t exactly the ones we are taught today Thrax : noun, verb, article, adverb, preposition, conjunction, participle, pronoun School English grammar: noun, verb, adjective, adverb, preposition, conjunction, pronoun, interjection 2

3 Why POS is important? Parts-of-speech are useful because of the large amount of information they give about a word and its neighbors: For instance: Knowing whether a word is a noun or a verb tells us a lot about likely neighboring words (nouns are preceded by determiners and adjectives, verbs by nouns) and about the syntactic structure around the word (nouns are generally part of noun phrases), which makes part-of-speech tagging an important component of syntactic parsing(ch12) Parts of speech are useful features for finding named entities like people or organizations in text and other information extraction tasks Parts of speech are useful in summarization for improving the selection of nouns or other important words from a document

4 Open class (lexical) words
Nouns Verbs Adjectives old older oldest Proper Common Main Adverbs slowly IBM Italy cat / cats snow see registered Numbers … more 122,312 one Closed class (functional) auxiliary Determiners the some can had Prepositions to with Conjunctions and or Particles off up … more Pronouns he its Interjections Ow Eh

5 Open vs. Closed classes Parts-of-speech can be divided into two broad supercategories: Closed class: are those with relatively fixed membership , such as: Prepositions: on, under, over, near, by, at, from, to, with . Determiners: a, an, the pronouns: she, who, I, Conjunctions: and, but, or, as, if, when . Auxiliary verbs: can, may, should, are particles: up, down, on, off, in, out, at, by. Numerals: one, two, three, first, second, third. Why “closed”? Open class: new nouns and verbs like iPhone or to fax are continually being created or borrowed. Nouns, main Verbs, Adjectives, Adverbs. Why “open”? 5

6 Penn Treebank POS (45 tags now used instead of 8)

7 POS Tagging The POS tagging problem is to determine the POS tag for a particular instance of a word because words are ambiguous have more than one possible POS. Example of word has more than one POS: “back” The back door = JJ “adjective” On my back = NN “noun” Win the voters back = RB “adverb” Promised to back the bill = VB “verb base”

8 Deciding on the correct part of speech can be difficult even for people
Mrs/NNP Shaefer/NNP never/RB got/VBD around/RP to/TO joining/VBG All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/IN the/DT corner/NN Chateau/NNP Petrus/NNP costs/VBZ around/RB 250/CD

9 How hard is the tagging problem? And how common is tag ambiguity?
Fig. 9.2 shows the answer for the Brown and WSJ corpora tagged using the 45-tag Penn tree bank

10 POS tagging performance
How many tags are correct? Is called (Tag accuracy) Baseline is already 90% (Tag accuracy) Baseline is algorithm for part of speech tagging POS which its goal to reduce that ambiguous words which is Tagging every word with its most frequent tag and Tagging unknown words as nouns Baseline is partly easy (why?) because Many words are unambiguous (85%-86%) You get points for them (the, a, etc.) and for punctuation marks! Because it easy to disambiguate. For example: “a” can be a determiner or the letter a (perhaps as part of an acronym or an initial). But the determiner sense of a is much more likely About 97% (Tag accuracy )currently by HMM, MEMM and other models.

11 Part-of-speech tagging
A simple but useful form of linguistic analysis Christopher Manning

12 Part-of-speech tagging revisited
A simple but useful form of linguistic analysis Christopher Manning

13 Sources of information
What are the main sources of information for POS tagging? Knowledge of neighboring words Bill saw that man yesterday NNP NN DT NN NN VB VB(D) IN VB NN Knowledge of word probabilities (Knowledge of most frequent tag to word) man is rarely used as a verb…. These were the features of HMM taggers model

14 More and Better Features  Feature-based tagger
Can do surprisingly well just looking at a word by itself: Word the: the  DT Lowercased word Importantly: importantly  RB Prefixes unfathomable: un-  JJ Suffixes Importantly: -ly  RB Capitalization Meridian: CAP  NNP Word shapes 35-year: d-x  JJ Then build a maxent (or whatever) model to predict tag Maxent P(t|w): 93.7% overall / 82.6% unknown Doing better

15 Overview: POS Tagging Accuracies
Rough accuracies: Most freq tag: ~90% / ~50% Trigram HMM: ~95% / ~55% Maxent P(t|w): % / 82.6% TnT (HMM++): % / 86.0% MEMM tagger: % / 86.9% Bidirectional dependencies: 97.2% / 90.0% Upper bound: ~98% (human agreement) Most errors on unknown words

16 How to improve supervised results?
Build better features! We could fix this with a feature that looked at the next word We could fix this by linking capitalized words to their lowercase versions RB PRP VBD IN RB IN PRP VBD . They left as soon as he arrived . JJ NNP NNS VBD VBN Intrinsic flaws remained undetected .

17 Tagging Without Sequence Information
Baseline Three Words t0 t0 w0 w-1 w0 w1 Model Features Token Unknown Sentence Baseline 56,805 93.69% 82.61% 26.74% 3Words 239,767 96.57% 86.78% 48.27% Using words only in a straight classifier works as well as a basic (HMM or discriminative) sequence model!!


Download ppt "Part-of-speech tagging"

Similar presentations


Ads by Google