Presentation is loading. Please wait.

Presentation is loading. Please wait.

Morphological Processing & Stemming Using FSAs/FSTs.

Similar presentations


Presentation on theme: "Morphological Processing & Stemming Using FSAs/FSTs."— Presentation transcript:

1 Morphological Processing & Stemming Using FSAs/FSTs

2 FSAs and Morphology Can be used to validate/recognize input string For example, consider the Spanish conjugation for amar in J&M p. 64 What would a FSA look like the would recognize the input? am … 12 a e 3 4 … s m 5 6

3 FSTs and Morphology An FST could output information about the input, such as a translation or grammatical info: am:love … 12 a:ε o:ε 3 ε:impf a e 7

4 FSAs and NLP Why even use FSAs in NLP? Memory and storage are cheap –Build one large lexicon –List all entries and req’d output amo: amas:ames love love love pres ind pres impf pres subj Some NLP apps do this (e.g., AZ Noun Phraser (Tolle 2001)) [][][]

5 FSAs and NLP For more morphologically complex languages, one big lexicon not feasible Consider Hungarian and Finnish –One verbal form Hundreds of possible inflections Millions of resulting forms –A complete “word” lexicon not feasible –Morphological processing essential

6 Hungarian Consider one concept/’word’ in Hungarian: hazhouse hazathouse (object) haznakof the house hazzalwith the house hazzainto a house hazbainto the house hazrato the house …

7 Hungarian Now consider plural inflections: hazakhouses hazakathouses (object) hazaknakof the houses hazakzalwith the houses hazakzainto a houses hazakbainto the houses hazakrato the houses …

8 Hungarian And possessives: hazaimmy houses hazaimatmy houses (object) hazaimnakof the houses hazaimzalwith the houses hazaimzainto a houses hazaimbainto the houses hazaimrato the houses …

9 Stop

10 Stemming Used in many IR applications For building equivalence classes Connect Connected Connecting Connection Connections Porter Stemmer, simple and efficient Website: http://www.tartarus.org/~martin/PorterStemmer http://www.tartarus.org/~martin/PorterStemmer Same class; suffixes irrelevant

11 Stop

12 Stemming and Performance Does stemming help IR performance? Harman 91 indicated that it hurt as much as it helped Krovetz 93 shows that stemming does helpKrovetz 93 –Porter-like algorithms work well with smaller documents –Krovetz proposes that stemming loses information –Derivational morphemes tell us something that helps identify word senses (and helps in IR) Stemming them = information loss

13 Evaluating Performance Measures of Stemming Performance rely on similar metrics used in IR: –Precision: measure of the proportion of selected items the system got right precision = tp / (tp + fp) –Recall: measure of the proportion of the target items the system selected recall = tp / (tp + fn) –Rule of thumb: as precision increases, recall drops, and vice versa Metrics widely adopted in Stat NLP

14 Precision and Recall Take a given stemming task –Suppose there are 100 words that could be stemmed –A stemmer gets 52 of these right (tp) –But it inadvertently stems 10 others (fp) Precision = 52 / (52 + 10) =.84 Recall = 52 / (52 + 48) =.52


Download ppt "Morphological Processing & Stemming Using FSAs/FSTs."

Similar presentations


Ads by Google