Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Similar presentations


Presentation on theme: "CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)"— Presentation transcript:

1 CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

2 April 2005CLINT Lecture IV2 Transformation-Based Tagging A combination of rule-based and stochastic tagging methodologies: like the rule-based tagging because rules are used to specify tags in a certain environment; like stochastic tagging, because machine learning is used. uses Transformation-Based Learning (TBL) Input: tagged corpus  dictionary (with most frequent tags)

3 April 2005CLINT Lecture IV3 Transformation Based Error Driven Learning unannotated text initial state annotated text TRUTHlearner transformation rules diagram after Brill (1996)

4 April 2005CLINT Lecture IV4 TBL Requirements Initial State Annotator List of allowable transformations Scoring function Search strategy

5 April 2005CLINT Lecture IV5 The Basic Algorithm Label every word with its most likely tag Repeat the following until a stopping condition is reached. Examine every possible transformation, selecting the one that results in the most improved tagging Retag the data according to this rule Append this rule to output list Return output list

6 April 2005CLINT Lecture IV6 Transformation-Based Tagging Basic Process: Set the most probable tag for each word as a start value, e.g. tag all “race” with NN P(NN|race) =.98 P(VB|race) =.02 The set of possible transformations is limited by using a fixed number of rule templates, containing slots and allowing a fixed number of fillers to fill the slots

7 April 2005CLINT Lecture IV7 Rule Templates - triggering environments Schemat i-3 t i-2 t i-1 t i t i+1 t i+2 t i+3 1* 2* 3* 4* 5* 6* 7* 8* 9*

8 April 2005CLINT Lecture IV8 Rule Types and Instances Brill’s Templates Each rule begins with change tag a to tag b The variables a,b,z,w range over POS tags All possible variable substitutions are considered

9 April 2005CLINT Lecture IV9 Examples of learned rules

10 April 2005CLINT Lecture IV10 TBL: Remarks Execution Speed: TBL tagger is slower than HMM approach. Learning Speed is slow: Brill’s implementation over a day (600k tokens) BUT … Learns small number of simple, non- stochastic rules Can be made to work faster with Finite State Transducers

11 April 2005CLINT Lecture IV11 Tagging Unknown Words New words added to (newspaper) language 20+ per month Plus many proper names … Increases error rates by 1-2% Methods Assume the unknowns are nouns. Assume the unknowns have a probability distribution similar to words occurring once in the training set. Use morphological information, e.g. words ending with –ed tend to be tagged VBN.

12 April 2005CLINT Lecture IV12 Evaluation The result is compared with a manually coded “Gold Standard” Typically accuracy reaches 95-97% This may be compared with the result for a baseline tagger (one that uses no context). Important: 100% accuracy is impossible even for human annotators.

13 April 2005CLINT Lecture IV13 A word of caution 95% accuracy: every 20th token wrong 96% accuracy: every 25th token wrong an improvement of 25% from 95% to 96% ??? 97% accuracy: every 33th token wrong 98% accuracy: every 50th token wrong

14 April 2005CLINT Lecture IV14 How much training data is needed? When working with the STTS (50 tags) we observed a strong increase in accuracy when testing on 10´000, 20´000, …, 50´000 tokens, a slight increase in accuracy when testing on up to 100´000 tokens, hardly any increase thereafter.

15 April 2005CLINT Lecture IV15 Summary Tagging decisions are conditioned on a wider range of events that HMM models mentioned earlier. For example, left and right context can be used simultaneously. Learning and tagging are simple, intuitive and understandable. Transformation-based learning has also been applied to sentence parsing.

16 April 2005CLINT Lecture IV16 The Three Approaches Compared Rule Based Hand crafted rules It takes too long to come up with good rules Portability problems Stochastic Find the sequence with the highest probability – Viterbi Algorithm Result of training not accessible to humans Large storage requirements for intermediate results whilst training. Transformation Rules are learned Small number of rules Rules can be inspected and modified by humans


Download ppt "CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)"

Similar presentations


Ads by Google