Presentation is loading. Please wait.

Presentation is loading. Please wait.

Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.

Similar presentations


Presentation on theme: "Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging."— Presentation transcript:

1 Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging

2 April 2005CLINT Lecture IV2 Acknowledgment Most slides taken from Bonnie Dorr’s course notes: www.umiacs.umd.edu/~bonnie/courses/cmsc723-03 www.umiacs.umd.edu/~bonnie/courses/cmsc723-03 Jurafsky & Martin Chapter 5

3 April 2005CLINT Lecture IV3 Bibliography A. Voutilainen, Morphological disambiguation, in Karlsson, Voutilainen, Heikkila, Anttila (eds) Constraint Grammar pp165-284, Mouton de Gruyter, 1995. See [e-book]e-book

4 April 2005CLINT Lecture IV4 EngCG Rule-Based Tagger (Voutilainen 1995) Rules based on English Constraint Grammar Two stage design Uses ENGTWOL Lexicon Hand written disambiguation rules

5 April 2005CLINT Lecture IV5 ENGTWOL Lexicon Based on TWO-Level morphology of English (hence the name) 56,000 entries for English word stems Each entry annotated with morphological and syntactic features

6 April 2005CLINT Lecture IV6 Sample ENGTWOL Lexicon

7 April 2005CLINT Lecture IV7 Examples of constraints (informal) Discard all verb readings if to the left there is an unambiguous determiner, and between that determiner and the ambiguous word itself, there are no nominals (nouns, abbreviations etc.). Discard all finite verb readings if the immediately preceding word is to. Discard all subjunctive readings if to the left, there are no instances of the subordinating conjunction that or lest. The first constraint would discard the verb reading (next slide) There are about 1,100 constraints altogether

8 April 2005CLINT Lecture IV8 Actual Constraint Syntax Given input: “that” If (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A) Then eliminate non-ADV tags Else eliminate ADV tag this rule eliminates the adverbial sense of that as in “it isn’t that odd”

9 April 2005CLINT Lecture IV9 ENGCG Tagger Stage 1: Run words through morphological analyzer to get all parts of speech. E.g. for the phrase “the tables”, we get the following output: " " "the" DET CENTRAL ART SG/PL " " "table" N NOM PL "table" V PRES SG3 VFIN Stage 2: Apply constraints to rule out incorrect POSs

10 April 2005CLINT Lecture IV10 Example WORDTAGS PavlovPVLOV N NOM SG PROPER hadHAVE V PAST VFIN SVO HAVE PCP2 SVO shownSHOW PCP2 SVOO SVO SV thatADV PRON DEM SG DET CENTRAL SEM SG CS (subord. conj) salivationN NOM SG

11 Performance Tested on examples from Wall St Journal, Brown Corpus, Lancaster-Oslo-Bergen Corpus After application of the rules 93-97% of all words are fully disambiguated, and 99.7% of all words retain correct reading. At the time, this was superior performance to other taggers However, one should not discount the amount of effort needed to create this system


Download ppt "Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging."

Similar presentations


Ads by Google