Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Silke Scheible, Richard Jason Whitt, Martin Durrell, and Paul Bennett The GerManC project School of Languages, Linguistics, and Cultures University of Manchester (UK)
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Overview Motivation The GerManC corpus POS-tagger and tagset Challenges Results 2
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Motivation Goal: – POS-tagged version of GerManC corpus 3
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Motivation Goal: – POS-tagged version of GerManC corpus Problems: – No specialised tagger available for EMG – Limited funds: Manual annotation not feasible for whole corpus 4
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Motivation Goal: – POS-tagged version of GerManC corpus Problems: – No specialised tagger available for EMG – Limited funds: Manual annotation not feasible for whole corpus Question: – How well does an ‘off-the shelf’ tagger for modern German perform on Early Modern German data? 5
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Motivation Tagger evaluation requires gold standard data 6
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Motivation Tagger evaluation requires gold standard data Idea: – Develop gold-standard subcorpus of GerManC – Use subcorpus to test and adapt modern NLP tools – Create historical text processing pipeline 7
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Motivation Tagger evaluation requires gold standard data Idea: – Develop gold-standard subcorpus of GerManC – Use subcorpus to test and adapt modern NLP tools – Create historical text processing pipeline Results useful for other small humanities- based projects wishing to add POS annotations to EMG data 8
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text The GerManC corpus 9
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text The GerManC corpus Purpose: Studies of development and standardisation of German language Texts published between 1650 and 1800 Sample corpus (2,000 words per text) Total corpus size: ca. 1 million words Aims to be “representative” 10
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text The GerManC corpus Eight genres 11 Orally- oriented Print-oriented Dramas Newspapers Letters Sermons Narrative prose Humanities texts Science & medicine texts Legal texts
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text The GerManC corpus Three periods
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text The GerManC corpus Five regions 13 North German West Central German East Central German West Upper German East Upper German
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text The GerManC corpus Three 2,000-word files per genre/period/region Total size: ca. 1 million words 14
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Gold-standard subcorpus: GerManC-GS One 2,000-word file per genre and period from North German region 24 files > 50,000 tokens Annotated by two historical linguists Gold standard POS tags, lemmas, and normalised word forms 15
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text POS-tagger TreeTagger (Schmid, 1994) Statistical, decision tree-based POS tagger Parameter file for modern German supplied with the tagger Trained on German newspaper corpus STTS tagset 16
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text STTS-EMG 1.PIAT (merged with PIDAT): Indefinite determiner, as in ‘viele solche Bemerkungen’ (‘many such remarks’) 17
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text STTS-EMG 2.NA: Adjectives used as nouns, as in ‘der Gesandte’ (‘the ambassador’) 18
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text STTS-EMG 3.PAVREL: Pronominal adverb used as relative, as in ‘die Puppe, damit sie spielt’ (‘the doll with which she plays’) 4.PTKREL: Indeclinable relative particle, as in ‘die Fälle, so aus Schwachheit entstehen’ (‘the cases which arise from weakness’) 19
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text STTS-EMG 5.PWAVREL: Interrogative adverb used as relative, as in ‘der Zaun, worüber sie springt’ (‘the fence over which she jumps’) 6.PWREL: Interrogative pronoun used as relative, as in ‘etwas, was er sieht’ (‘something which he sees’) 20
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text POS-tagging in GerManC-GS New categories account for 2% of all tokens IAA on POS-tagging task: 91.6% 21
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Challenges: Tokenisation issues Clitics: – hastu: hast du (‘have you’) - wirstu: wirst du (‘will you’) 22
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Challenges: Tokenisation issues Clitics: – has|tu: hast du (‘have you’) - wirs|tu: wirst du (‘will you’) 23
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Challenges: Tokenisation issues Clitics: – has|tu: hast du (‘have you’) - wirs|tu: wirst du (‘will you’) Multi-word tokens: – obgleich vs. ob gleich (‘even though’) 24
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Challenges: Tokenisation issues Clitics: – has|tu: hast du (‘have you’) - wirs|tu: wirst du (‘will you’) Multi-word tokens: – obgleich/KOUS vs. ob/KOUS gleich/ADV (‘even though’) 25
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Challenges: Spelling variation Spelling not standardised: – Comet Komet – auff auf – nachdeme nachdem – kompt kommt – Bothenbrodt Botenbrot – differiret differiert – beßer besser – kehme käme – trucken trockenen – gepressett gepreßt – büxen Büchsen 26
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Challenges: Spelling variation All spelling variants in GerManC-GS normalised to a modern standard Assess what effect spelling variation has on the performance of automatic tools Help improve automated processing? Important for: – Automatic tools (POS tagger!) – Accurate corpus search 27
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Challenges: Spelling variation Proportion of normalised word tokens plotted against time 28
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Questions What is the “off-the-shelf” performance of the TreeTagger on historical data from the EMG period? Can the results be improved by running the tool on normalised data? 29
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Results Original dataNormalised data Accuracy69.6%79.7% 30 TreeTagger accuracy on original vs. normalised input
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Improvement through normalisation over time 31 Tagger performance plotted against publication date
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Effects of spelling normalisation on POS tagger performance 32 For normalised tokens: Effect of using original (O)/normalised (N) input on tagger accuracy +: correctly tagged; -: incorrectly tagged
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Comparison with “modern” results Performance of TreeTagger on modern data: ca. 97% (Schmid, 1995) Current results seem low 33
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Comparison with “modern” results Performance of TreeTagger on modern data: ca. 97% (Schmid, 1995) Current results seem low But: – Modern accuracy figure: evaluation of tagger on the text type it was developed on (newspaper text) 34
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Comparison with “modern” results Performance of TreeTagger on modern data: ca. 97% (Schmid, 1995) Current results seem low But: – Modern accuracy figure: evaluation of tagger on the text type it was developed on (newspaper text) – IAA higher for modern German (98.6%) 35
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Conclusion Substantial amount of manual post-editing required Normalisation layer can improve results by 10%, but so far only half of all annotations have positive effect 36
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text Future work Adapt normalisation scheme to account for more cases Automate normalisation (Jurish, 2010) Retrain state-of-the-art POS taggers Evaluation? Provide detailed information about annotation quality to research community 37
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text 38 Thank you!