Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parts of Speech Generally speaking, the “grammatical type” of word: –Verb, Noun, Adjective, Adverb, Article, … We can also include inflection: –Verbs:

Similar presentations


Presentation on theme: "Parts of Speech Generally speaking, the “grammatical type” of word: –Verb, Noun, Adjective, Adverb, Article, … We can also include inflection: –Verbs:"— Presentation transcript:

1 Parts of Speech Generally speaking, the “grammatical type” of word: –Verb, Noun, Adjective, Adverb, Article, … We can also include inflection: –Verbs: Tense, number, … –Nouns: Number, proper/common, … –Adjectives: comparative, superlative, … –… Most commonly used POS sets for English have different tags

2 BNC Parts of Speech Nouns: NN0 Common noun, neutral for number (e.g. aircraft NN1 Singular common noun (e.g. pencil, goose, time NN2 Plural common noun (e.g. pencils, geese, times NP0 Proper noun (e.g. London, Michael, Mars, IBM Pronouns: PNI Indefinite pronoun (e.g. none, everything, one PNP Personal pronoun (e.g. I, you, them, ours PNQ Wh-pronoun (e.g. who, whoever, whom PNX Reflexive pronoun (e.g. myself, itself, ourselves

3 Verbs: VVB finite base form of lexical verbs (e.g. forget, send, live, return VVD past tense form of lexical verbs (e.g. forgot, sent, lived VVG -ing form of lexical verbs (e.g. forgetting, sending, living VVI infinitive form of lexical verbs (e.g. forget, send, live, return VVN past participle form of lexical verbs (e.g. forgotten, sent, lived VVZ -s form of lexical verbs (e.g. forgets, sends, lives, returns VBB present tense of BE, except for is …and so on: VBD VBG VBI VBN VBZ VDB finite base form of DO: do …and so on: VDD VDG VDI VDN VDZ VHB finite base form of HAVE: have, 've …and so on: VHD VHG VHI VHN VHZ VM0 Modal auxiliary verb (e.g. will, would, can, could, 'll, 'd)

4 Articles AT0 Article (e.g. the, a, an, no) DPS Possessive determiner (e.g. your, their, his) DT0 General determiner (this, that) DTQ Wh-determiner (e.g. which, what, whose, whichever) EX0 Existential there, i.e. occurring in “there is…” or “there are…” Adjectives AJ0 Adjective (general or positive) (e.g. good, old, beautiful) AJC Comparative adjective (e.g. better, older) AJS Superlative adjective (e.g. best, oldest) Adverbs AV0 General adverb (e.g. often, well, longer (adv.), furthest. AVP Adverb particle (e.g. up, off, out) AVQ Wh-adverb (e.g. when, where, how, why, wherever)

5 Miscellaneous: CJC Coordinating conjunction (e.g. and, or, but) CJS Subordinating conjunction (e.g. although, when) CJT The subordinating conjunction that CRD Cardinal number (e.g. one, 3, fifty-five, 3609) ORD Ordinal numeral (e.g. first, sixth, 77th, last) ITJ Interjection or other isolate (e.g. oh, yes, mhm, wow) POS The possessive or genitive marker 's or ' TO0 Infinitive marker to PUL Punctuation: left bracket - i.e. ( or [ PUN Punctuation: general separating mark - i.e.., !, : ; - or ? PUQ Punctuation: quotation mark - i.e. ' or " PUR Punctuation: right bracket - i.e. ) or ] XX0 The negative particle not or n't ZZ0 Alphabetical symbols (e.g. A, a, B, b, c, d)

6 Task: Part-Of-Speech Tagging Goal: Assign the correct part-of-speech to each word (and punctuation) in a text. Example: Learn a local model of POS dependencies, usually from pre-tagged data No parsing Twooldmenbetonthegame. CRDAJ0NN2VVDPP0AT0NN1PUN

7 Hidden Markov Models Assume: POS (state) sequence generated as time- invariant random process, and each POS randomly generates a word (output symbol) AT0 NN1 NN2 AJ “the” “a” “cat” “bet” “cats” “men”

8 Definition of HMM for Tagging Set of states – all possible tags Output alphabet – all words in the language State/tag transition probabilities Initial state probabilities: the probability of beginning a sentence with a tag t (t 0  t) Output probabilities – producing word w at state t Output sequence – observed word sequence State sequence – underlying tag sequence

9 First-order (bigram) Markov assumptions: –Limited Horizon: Tag depends only on previous tag P(t i+1 = t k | t 1 = t j 1,…,t i = t j i ) = P(t i+1 = t k | t i = t j ) –Time invariance: No change over time P(t i+1 = t k | t i = t j ) = P(t 2 = t k | t 1 = t j ) = P( t j  t k ) Output probabilities: –Probability of getting word w k for tag t j : P(w k | t j ) –Assumption: Not dependent on other tags or words! HMMs For Tagging

10 Combining Probabilities Probability of a tag sequence: P(t 1 t 2 …t n ) = P(t 1 )P(t 1  t 2 )P(t 2  t 3 )…P(t n-1  t n ) Assume t 0 – starting tag: = P(t 0  t 1 )P(t 1  t 2 )P(t 2  t 3 )…P(t n-1  t n ) Prob. of word sequence and tag sequence: P(W,T) =  i P(t i-1  t i ) P(w i | t i )

11 Training from Labeled Corpus Labeled training = each word has a POS tag Thus: P MLE (t j ) = C(t j ) / N P MLE (t j  t k ) = C(t j, t k ) / C(t j ) P MLE (w k | t j ) = C(t j :w k ) / C(t j ) Smoothing applies as usual

12 Viterbi Tagging Most probable tag sequence given text: T*= arg max T P m (T | W) = arg max T P m (W | T) P m (T) / P m (W) (Bayes’ Theorem) = arg max T P m (W | T) P m (T) (W is constant for all T) = arg max T  i [ m(t i-1  t i ) m(w i | t i ) ] = arg max T  i log [ m(t i-1  t i ) m(w i | t i ) ] Exponential number of possible tag sequences – use dynamic programming for efficient computation

13 -log m t1t1 t2t2 t3t3 t 0  t 1  t 2  t 3  log m w1w1 w2w2 w3w3 t1t t2t t3t t1t1 t2t2 t3t3 w1w1 t1t1 t2t2 t3t3 w2w2 t1t1 t2t2 t3t3 w3w3 t0t

14 Viterbi Algorithm 1. D(0, S TART ) = 0 2. for each tag t != S TART do: D(1, t) = -  3. for i  1 to N do: a.for each tag t j do: D(i, t j )  max k D(i-1,t k ) + lm(t k  t j ) + lm(w i |t j ) Record best(i,j)=k which yielded the max 4.log P(W,T) = max j D(N, t j ) 5.Reconstruct path from max j backwards Where: lm(.) = log m(.) and D(i, t j ) – max joint probability of state and word sequences till position i, ending at t j. Complexity: O(N t 2 N)


Download ppt "Parts of Speech Generally speaking, the “grammatical type” of word: –Verb, Noun, Adjective, Adverb, Article, … We can also include inflection: –Verbs:"

Similar presentations


Ads by Google