1 COMP790: Statistical NLP POS Tagging Chap. 10. 2 POS tagging Goal: assign the right part of speech (noun, verb, …) to words in a text “The/AT representative/NN.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details.
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Outline Why part of speech tagging? Word classes
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams.
LINGUISTICA GENERALE E COMPUTAZIONALE DISAMBIGUAZIONE DELLE PARTI DEL DISCORSO.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
1 Lecture 12: Tagging (Chapter 10 of Manning and Schutze, Chapter 8 Jurafsky and Martin,Thede et al., Brants) Wen-Hsiang Lu ( 盧文祥 ) Department of Computer.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
1 I256: Applied Natural Language Processing Marti Hearst Sept 20, 2006.
POS Tagging HMM Taggers (continued). Today Walk through the guts of an HMM Tagger Address problems with HMM Taggers, specifically unknown words.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Part of speech (POS) tagging
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Word classes and part of speech tagging Chapter 5.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Part-of-Speech Tagging
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Some Advances in Transformation-Based Part of Speech Tagging
CS 4705 Hidden Markov Models Julia Hirschberg CS4705.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Part of Speech Tagging & Hidden Markov Models Mitch Marcus CSE 391.
Natural Language Processing Lecture 8—2/5/2015 Susan W. Brown.
Lecture 6 POS Tagging Methods Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
CSA3202 Human Language Technology HMMs for POS Tagging.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
Lecture 5 POS Tagging Methods
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
Part of Speech Tagging September 9, /12/2018.
CSCI 5832 Natural Language Processing
Lecture 6: Part of Speech Tagging (II): October 14, 2004 Neal Snider
Presentation transcript:

1 COMP790: Statistical NLP POS Tagging Chap. 10

2 POS tagging Goal: assign the right part of speech (noun, verb, …) to words in a text “The/AT representative/NN put/VBD chairs/NNS on/IN the/AT table/NN.” Terminology  POS, part-of-speech tag  word class  morphological class  lexical tag  grammatical tag

3 Purpose:  1st step to NLU  easier then full NLU (results > 95% accuracy) Useful for:  speech recognition / synthesis (better accuracy) how to recognize/pronounce a word CONtent/noun VS conTENT/adj  stemming in IR which morphological affixes the word can take adverb - ly = noun (friendly - ly = friend)  for IR and QA pick out nouns which may be more important than other words in indexing documents or query analysis  partial parsing/chunking (for IE) to find noun phrases/verb phrases Why do POS Tagging?

4 Tag sets Different tag sets, depends on the purpose of the application  45 tags in Penn Treebank  62 tags in CLAWS with BNC corpus  79 tags in Church (1991)  87 tags in Brown corpus  147 tags in C7 tagset  258 tags in Tzoukermann and Radev (1995)

5 Tag set: Penn TreeBank INpreposition or subordinating conjunct. JJadjective or numeral, ordinal JJRadjective, comparative NNnoun, common, singular or mass NNPnoun, proper, singular NNSnoun, common, plural TO"to" as preposition or infinitive marker VBverb, base form VBDverb, past tense VBGverb, present participle or gerund VBNverb, past participle VBPverb, present tense, not 3rd p. singular VBZverb, present tense, 3rd p. singular … 45 tags

6 but most word types are rare… Brown corpus (Francis&Kucera, 1982):  11.5% word types are ambiguous (>1 tag)  40% word tokens are ambiguous (>1 tag) Most word types are not ambiguous but...

7 rule-based tagging  uses hand-written rules stochastic tagging  uses probabilities computed from training corpus transformation-based tagging  uses rules learned automatically Techniques to POS tagging

8 Information sources for tagging All techniques are based on the same observations… Syntagmatic information:  some tag sequences are more probable than others ART+ADJ+N is more probable than ART+ADJ+VB Lexical information:  knowing the word to be tagged gives a lot of information about the correct tag “table”: {noun, verb} but not a {adj, prep,…} “rose”: {noun, adj, verb} but not {prep,...}

9 Naïve POS tagging using only syntagmatic patterns:  Green & Rubin (1971)  accuracy of 77% using the most-likely tag for each word:  Charniak et al. (1993)  accuracy of 90%  much better, but not very good... 1 mistake every 10 words  used as baseline for evaluation

10 --> rule-based tagging  uses hand-written rules stochastic tagging  uses probabilities computed from training corpus transformation-based tagging  uses rules learned automatically Techniques to POS tagging

11 Rule-based POS tagging Step 1: Assign each word with all possible tags  use dictionary Step 2: Use if-then rules to identify the correct tag in context (disambiguation rules)

12 Sample rules N-IP rule: A tag N (noun) cannot be followed by a tag IP (interrogative pronoun)... man who … man: {N} who: {RP, IP} --> {RP} relative pronoun ART-V rule: A tag ART (article) cannot be followed by a tag V (verb)...the book… the: {ART} book: {N, V} --> {N}

13 rule-based tagging  uses hand-written rules --> stochastic tagging  uses probabilities computed from training corpus transformation-based tagging  uses rules learned automatically Techniques to POS tagging

14 Stochastic POS tagging Assume that a word’s tag only depends on the previous tags (not following ones) Use a training set (manually tagged corpus) to:  learn the regularities of tag sequences  learn the possible tags for a word  model this info through a language model (n- gram)

15 Goal: maximize P(word|tag) x P(tag|previous n tags) P(word|tag)  word/lexical likelihood  probability that given this tag, we have this word  NOT probability that this word has this tag  modeled through language model (word-tag matrix) P(tag|previous n tags)  tag sequence likelihood  probability that this tag follows these previous tags  modeled through language model (tag-tag matrix) Hidden Markov Model (HMM) Taggers Lexical information Syntagmatic information

16 P(tag|previous n tags)  if we look (n-1) tags before to find current tag --> n-gram model  trigram model chooses the most probable tag t i for word w i given:  the previous 2 tags t i-2 & t i-1 and  the current word w i  bigram model chooses the most probable tag t i for word w i given:  the previous tag t i-1 and  the current word w i  unigram model (just most-likely tag) chooses the most probable tag t i for word w i given:  the current word w i Tag sequence probability

17 Example “race” can be VB or NN  “Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/ADV”  “People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN” let’s tag the word “race” in 1 st sentence with a bigram model.

18 Example (con’t) assuming previous words have been tagged, we have: “Secretariat/NNP is/VBZ expected/VBN to/TO race/?? tomorrow”  P(race|VB) x P(VB|TO) ? given that we have a VB, how likely is the current word to be race given that the previous tag is TO, how likely is the current tag to be VB  P(race|NN) x P(NN|TO) ? given that we have a NN, how likely is the current word to be race given that the previous tag is TO, how likely is the current tag to be NN

19 From the training corpus, we found that:  P(NN|TO) =.021 // given that the previous tag is TO // 2.1% chances that the current tag is NN  P(VB|TO) =.34 // given that the previous tag is TO // 34% chances that the current tag is VB  P(race|NN) = // given that we have an NN // 0.041% chances that this word is "race"  P(race|VB) = // given that we have a VB // 003% chances that this word is "race" so: P(VB|TO) x P(race|VB) =.34 x = P(NN|TO) x P(race|NN) =.021 x = so: VB is more probable! Example (con’t)

20 and by the way: race is 98% of the time a NN !!! P(VB|race) = 0.02 P(NN|race) = 0.98 !!! How are the probabilities found ?  using a training corpus of hand-tagged text  long & meticulous work done by linguists Example (con’t)

21 HMM Tagging But HMM tagging tries to find:  the best sequence of tags for a sentence  not just best tag for a single word goal: maximize the probability of a tag sequence, given a word sequence i.e. choose the sequence of tags that maximizes P(tag sequence|word sequence)

22 By Bayes law: wordSeq is given…  so P(wordSeq) will be the same for all tagSeq  so we can drop it from the equation HMM Tagging (con’t)

23 1. words are independent 2. Markov assumption (approximation to short history)  ex. with bigram approximation: 3. probability of a word is only dependent on its tag emission probability state transition probability Assumptions in HMM Tagging

24 The derivation bestTagSeq = argmax P(tagSeq) x P(wordSeq|tagSeq) (t 1 …t n )* = argmax P( t 1, …, t n ) x P(w 1, …, w n | t 1, …, t n ) Assumption 1: Independence assumption + Chain rule P(t 1, …, t n ) x P(w 1, …, w n | t 1, …, t n ) = P(t n | t 1, …, t n-1 ) x P(t n-1 | t 1, …, t n-2 ) x P(t n-2 | t 1, …, t n-3 ) x … x P(t 1 ) x P(w 1 | t 1, …, t n ) x P(w 2 | t 1, …, t n ) x P(w 3 | t 1, …, t n ) x … x P(w n | t 1, …, t n ) Assumption 2: Markov assumption: only look at short history (ex. bigram) = P(t n |t n-1 ) x P(t n-1 |t n-2 ) x P(t n-2 |t n-3 ) x … x P(t 1 ) x P(w 1 | t 1, …, t n ) x P(w 2 | t 1, …, t n ) x P(w 3 | t 1, …, t n ) x … x P(w n | t 1, …, t n ) Assumption 3: A word’s identity only depends on its tag = P(t n |t n-1 ) x P(t n-1 |t n-2 ) x P(t n-2 |t n-3 ) x … x P(t 1 ) x P(w 1 | t 1 ) x P(w 2 | t 2 ) x P(w 3 | t 3 ) x … x P(w n | t n )

25 Emissions & Transitions probabilities let  N: number of possible tags (size of tag set)  V: number of word types (vocabulary) from a tagged training corpus, we compute the frequency of:  Emission probabilities P(w i | t i ) stored in an N x V matrix emission[i,j] = probability that tag i is the correct tag for word j  Transitions probabilities P(t i |t i-1 ) stored in an N x N matrix transmission[i,j] = probability that tag i follows tag j In practice, these matrices are very sparse So these models are smoothed to avoid zero probabilities

26 Emission probabilities P(w i | t i ) stored in an N x V matrix emission[i,j] = probability/frequency that tag i is the correct tag for word j

27 Transitions probabilities P(t i |t i-1 ) stored in an N x N matrix transmission[i,j] = probability/frequency that tag i follows tag j

28 Efficiency issues to find the best probability of a sequence is exponential in time for efficiency, we usually use the Viterbi algorithm  For global maximisation  i.e. best tag sequence

29 an Example Emission probabilities: Transition probabilities: Tag PNVBTOINATNN Vocabulary John likes to0.5 fish in0.51 the1 sea0.3 First Tag Second tag PNVBTOINATNNNone (last tag) PN VB TO1 IN AT NN None (1st tag)

30 State Transition Diagram (VMM) Transition probabilities PN start AT 0.2 NN IN VB 0.25 TO end

31 State Transition Diagram (HMM) but the states are "invisible" (we only see the words) John: 0.3 fish: PN start AT 0.2 NN IN VB 0.25 TO end likes: 0.1 to: 0.1 fish: 0.1 the: 0.1 in: 0.2 sea: 0.2 in: 0.1 … … … … … …

32 The Viterbi Algorithm best tag sequence for "John likes to fish in the sea"? efficiently computes the most likely state sequence given a particular output sequence based on dynamic programming

33 A smaller example 0.6 b q r start end What is the best sequence of states for the input string “bbba”? Computing all possible paths and finding the one with the max probability is exponential a b a

34 A smaller example (con’t) For each state, store the most likely sequence that could lead to it (and its probability) Path probability matrix:  An array of states versus time (tags versus words)  That stores the prob. of being at each state at each time in terms of the prob. for being in each state at the preceding time. Best sequenceInput sequence / time ε --> bb --> bbb --> bbbb --> a leading to q coming from q ε --> q 0.6 (1.0x0.6) q --> q (0.6x0.3x0.6) qq --> q (0.108x0.3x0.6) qrq --> q (0.1008x0.3x0.4) coming from r r --> q 0 (0x0.5x0.6) qr --> q (0.336x0.5x 0.6) qrr --> q (0.1344x0.5x0.4) leading to r coming from q ε --> r 0 (0x0.8) q --> r (0.6x0.7x0.8) qq --> r (0.108x0.7x0.8) qrq --> r (0.1008x0.7x0.2) coming from r r --> r 0 (0x0.5x0.8) qr --> r (0.336x0.5x0.8) qrr --> r (0.1344x0.5x0.2)

35 Viterbi for POS tagging Let:  n = nb of words in sentence to tag (nb of input tokens)  T = nb of tags in the tag set (nb of states)  vit = path probability matrix (viterbi) vit[i,j] = probability of being at state (tag) j at word i  state = matrix to recover the nodes of the best path (best tag sequence) state[i+1,j] = the state (tag) of the incoming arc that led to this most probable state j at word i+1 // Initialization vit[1,PERIOD]:=1.0 // pretend that there is a period before // our sentence (start tag = PERIOD) vit[1,t]:=0.0 for t ≠ PERIOD

36 Viterbi for POS tagging (con’t) // Induction (build the path probability matrix) for i:=1 to n step 1 do // for all words in the sentence for all tags t j do // for all possible tags // store the max prob of the path vit[i+1,t j ] := max 1≤k≤T (vit[i,t k ] x P(w i+1 |t j ) x P(t j | t k )) // store the actual state path[i+1,t j ] := argmax 1≤k≤T ( vit[i,t k ] x P(w i+1 |t j ) x P(t j | t k )) end //Termination and path-readout bestState n+1 := argmax 1≤j≤T vit[n+1,j] for j:=n to 1 step -1 do // for all the words in the sentence bestState j := path[i+1, bestState j+1 ] end P(bestState 1,…, bestState n ) := max 1≤j≤T vit[n+1,j] emission probability state transition probability probability of best path leading to state t k at word i

37 in bigram POS tagging, we condition a tag only on the preceding tag why not...  use more context (ex. use trigram model) more precise:  “is clearly marked” --> verb, past participle  “he clearly marked” --> verb, past tense combine trigram, bigram, unigram models  condition on words too but with an n-gram approach, this is too costly (too many parameters to model) transformation-based tagging... Possible improvements

38 rule-based tagging  uses hand-written rules stochastic tagging  uses probabilities computed from training corpus --> transformation-based tagging  uses rules learned automatically Techniques to POS tagging

39 Transformation-based tagging Due to Eric Brill (1995) basic idea:  take a non-optimal sequence of tags and  improve it successively by applying a series of well- ordered re-write rules rule-based but, rules are learned automatically by training on a pre-tagged corpus

40 1. Assign to words their most likely tag  P(NN|race) =.98  P(VB|race) = Change some tags by applying transformation rules An example

41 Types of context lots of latitude… can be:  tag-triggered transformation The preceding/following word is tagged this way The word two before/after is tagged this way...  word- triggered transformation The preceding/following word this word …  morphology- triggered transformation The preceding/following word finishes with an s …  a combination of the above The preceding word is tagged this ways AND the following word is this word

42 Learning the transformation rules Input: A corpus with each word:  correctly tagged (for reference)  tagged with its most frequent tag (C 0 ) Output: A bag of transformation rules Algorithm:  Instantiates a small set of hand-written templates (generic rules) by comparing the reference corpus to C 0 Change tag a to tag b when…  The preceding/following word is tagged z  The word two before/after is tagged z  One of the 2 preceding/following words is tagged z  One of the 2 preceding words is z  …

43 Learning the transformation rules (con't) Run the initial tagger and compile types of errors  For each error type, instantiate all templates to generate candidate transformations Apply each candidate transformation to the corpus and count the number of corrections and errors that it produces Save the transformation that yields the greatest improvement Stop when no transformation can reduce the error rate by a predetermined threshold

44 Example if the initial tagger mistags 159 words as verbs instead of nouns  create the error triple: Suppose template #3 is instantiated as the rule:  Change the tag from to if one of the two preceding words is tagged as a determiner. When this template is applied to the corpus:  it corrects 98 of the 159 errors  but it also creates 18 new errors Error reduction is 98-18=80

45 Learning the best transformations input:  a corpus with each word: correctly tagged (for reference) tagged with its most frequent tag (C 0 )  a bag of unordered transformation rules output:  an ordering of the best transformation rules

46 let:  E(C k ) = nb of words incorrectly tagged in the corpus at iteration k  v(C) = the corpus obtained after applying rule v on the corpus C ε = minimum number of errors desired for k:= 0 step 1 do bt := argmin t (E(t(C k )) // find the transformation t that minimizes // the error rate if ((E(C k ) - E(bt(C k ))) < ε) // if bt does not improve the tagging significantly then goto finished C k+1 := bt(C k ) // apply rule bt to the current corpus T k+1 := bt // bt will be kept as the current transformation // rule end finished: the sequence T 1 T 2 … T k is the ordered transformation rules Learning the best transformations (con’t)

47 Strengths of transformation-based tagging exploits a wider range of lexical and syntactic regularities can look at a wider context  condition the tags on preceding/next words not just preceding tags.  can use more context than bigram or trigram. transformation rules are easier to understand than matrices of probabilities

48 Evaluation of POS taggers compared with gold-standard of human performance metric:  accuracy = % of tags that are identical to gold standard most taggers ~96-97% accuracy must compare accuracy to:  ceiling (best possible results) how do human annotators score compared to each other? (96-97%) so systems are not bad at all!  baseline (worst possible results) what if we take the most-likely tag (unigram model) regardless of previous tags ? (90-91%) so anything less is really bad

49 More on tagger accuracy is 95% good?  that’s 5 mistakes every 100 words  if on average, a sentence is 20 words, that’s 1 mistake per sentence when comparing tagger accuracy, beware of:  size of training corpus the bigger, the better the results  difference between training & testing corpora (genre, domain…) the closer, the better the results  size of tag set Prediction versus classification  unknown words the more unknown words (not in dictionary), the worst the results

50 Error analysis of POS taggers Where did the tagger go wrong ? Use a confusion matrix / contingency table Most confused:  NN (noun) vs. NNP (proper noun) vs. JJ (adjective)  VBD (verb, past tense) vs. VBN (past participle) vs. JJ (adjective) he chopped carrots, the carrots were chopped, the chopped carrots

51 Major difficulties in POS tagging Unknown words (proper names)  because we do not know the set of tags it can take  and knowing this takes you a long way (cf. baseline POS tagger)  possible solutions: assign all possible tags with probabilities distribution identical to lexicon as a whole use morphological cues to infer possible tags  ex. word ending in -ed are likely to be past tense verbs or past participles Frequently confused tag pairs  preposition vs particle a hill (prep) / a bill (particle)  verb, past tense vs. past participle vs. adjective