Presentation is loading. Please wait.

Presentation is loading. Please wait.

ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin.

Similar presentations


Presentation on theme: "ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin."— Presentation transcript:

1 ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

2 2 You are not being taken Turkish Morphology – Beads on a String götürülmsunsunüyor takepassivenegative present progressive 2 nd person singular One Turkish Word

3 3 Computational Morphology Improves: Machine Translation Turkish-English (Oflazer, 2007) Czech-English (Goldwater and McClosky, 2005) Information Retrieval English, German, Finnish (Kurimo et al., 2008) Speech Recognition Finnish (Creutz, 2006) Grapheme-to-Phoneme Conversion German (Demberg, 2007)

4 4 Morphology is Complex – Operations PrefixationSuffixation

5 5 Morphology is Complex – Operations PrefixationReduplicationSuffixation

6 6 Morphology is Complex – Operations PrefixationReduplication Infixation Suffixation

7 7 Morphology is Complex – Operations PrefixationReduplication Infixation Suffixation

8 8 Morphology is Complex – Operations PrefixationReduplication Infixation Suffixation

9 9 götürülmsunsunüyor takepassivenegative present progressive You are not being taken 2 nd person singular Morphology is Complex – Morphophonology

10 10 sun yecek future 2 nd person singular Morphology is Complex – Morphophonology götürülm takepassivenegative You will not be taken

11 11 sun yecek future 2 nd person singular Morphology is Complex – Morphophonology götürülm takepassivenegative You will not be taken

12 12 sun yecek future 2 nd person singular Morphology is Complex – Morphophonology götürülmeme takepassivenegative You will not be taken

13 13 sin yecek future 2 nd person singular Morphology is Complex – Morphophonology götürülmeme takepassivenegative You will not be taken

14 14 sin yecek future 2 nd person singular Morphology is Complex – Morphophonology götürülmeme takepassivenegative You will not be taken

15 15 Morphology is Complex – Ambiguity Hungarian mentek men+tek go+Present.2 nd.Plural ‘yinz go’

16 16 Morphology is Complex – Ambiguity Hungarian mentek men+tek go+Present.2 nd.Plural ‘yinz go’ men+t+ek go+PastParticiple+Plural ‘those who have gone’

17 17 In Morphology Systems for New Languages Complexity Time + Expertise

18 18 In Morphology Systems for New Languages Complexity Time + Expertise Kemal Oflazer Expert on Turkish Computational morphology Time 3 - 4 Months to manually build a basic Turkish analyzer Plus lexicon development and maintenance

19 19 The Solution Raw Text Unsupervised Morphology Induction

20 20 The Solution Raw Text ?

21 21 The Solution Raw Text Language Structure

22 22 Techniques for Unsupervised Morphology Induction Transition Likelihood Harris (1955) – Finite State Automata Bernhard (2007)

23 23 Transition Likelihood Harris (1955) – Finite State Automata Bernhard (2007) Minimum Description Length Goldsmith (2001, 2006) Creutz’s Morfessor (2006) Techniques for Unsupervised Morphology Induction

24 24 Contextual Similarity Wicentowski (2002) Schone (2002) Techniques for Unsupervised Morphology Induction

25 25 Contextual Similarity Wicentowski (2002) Schone (2002) The Paradigm Snover (2002) ParaMor (2007) Techniques for Unsupervised Morphology Induction

26 26 What is a Paradigm? ülmsunsunüyor takepassivenegative present progressive 2 nd person singular götür

27 27 ülmsunsunüyor takepassivenegative present progressive 2 nd person singular götür Person & Number Paradigms Structure Inflectional Morphology

28 28 um Person & Number 1 st person singular umum ülmüyor takepassivenegative present progressive götür Paradigms Structure Inflectional Morphology

29 29 um Person & Number 3 rd person singular umum Ø ülmüyor takepassivenegative present progressive götür Paradigms Structure Inflectional Morphology

30 30 um umum Ø uzuz ülmüyor takepassivenegative present progressive götür Person & Number Paradigms Structure Inflectional Morphology

31 31 um umum Ø uzuz ülmüyor takepassivenegative present progressive götür Paradigm Mutually substitutable morphological operations Paradigm Paradigms Structure Inflectional Morphology

32 32 ülmum VoicePolarity Tense & Aspect Person & Number umum Ø uzuz üyor yecek Paradigms Structure Inflectional Morphology

33 33 Paradigms Paradigm Mutually substitutable morphological operations ülmum umum Ø uzuz üyor yecek Paradigms Structure Inflectional Morphology

34 34 Paradigm ülmum umum Ø uzuz üyor yecek Paradigm Mutually substitutable strings The ParaMor Algorithm

35 35 Paradigm ülmum umum Ø uzuz üyor yecek Candidate Stems 1 Morpheme Boundary The ParaMor Algorithm

36 36 The ParaMor Algorithm Simplifying Assumptions Suffixes only 70% of the World’s Languages are Suffixing (Dryer, 2005) Strict Concatenation

37 37 The ParaMor Algorithm Simplifying Assumptions Suffixes only 70% of the World’s Languages are Suffixing (Dryer, 2005) Strict Concatenation Only a High-Level Overview

38 38 The ParaMor Algorithm Identify Paradigms in 3 Steps ParaMor Identify

39 39 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms ParaMor Identify Search

40 40 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm ParaMor Identify Search Cluster

41 41 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter least likely candidates ParaMor Identify Search Cluster Filter

42 42 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter least likely candidates Segment Words Using the discovered paradigms ParaMor Identify Search Cluster Filter Segment

43 43 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter Segment Words Using the discovered paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Today

44 44 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter Segment Words Using the discovered paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results

45 45 s 10697 autorizaciones buscabamos costas importadoras vallas … Search for Candidate Paradigms Propose a morpheme boundary at every character boundary in every word Consolidate identical candidate suffixes into paradigm seeds Word List 50,000 Types ParaMor Identify Search Cluster Filter Segment Evaluation Results Spanish Example

46 46 s 10697 autorizaciones buscabamos costaØ costas importadoraØ importadoras vallaØ vallas … Ø s 5513 Identify the most frequent mutually replaceable candidate suffix Stems that occur with one suffix in a paradigm will likely occur with other suffixes in that paradigm Search for Candidate Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Spanish Example

47 47 s 10697 A Parameter halts the introduction of suffixes When the most frequent mutually replaceable candidate suffix severely decreases the stem count Ø s 5513 Ø r s 281 autorizaciones buscabamos costar costaØ costas importadoraØ importadoras vallaØ vallas … Search for Candidate Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results

48 48 s 10697 Ø s 5513 Ø r s 281 autorizaciones buscabamos costar costaØ costas importadoraØ importadoras vallaØ vallas … Search for Candidate Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Parameters set to produce High-recall Spanish paradigms And then frozen

49 49 Move on to the next most frequent paradigm seed a 9020 s 10697 Ø s 5513 Ø r s 281 Search for Candidate Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results

50 50 a 9020 a o 2325 a o os 1418 a as o os 899 s 10697 Ø s 5513 Ø r s 281 Search for Candidate Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results

51 51 n 6039 Ø n 1863 Ø n r 512 Ø do n r 357 Ø da das do dos n ndo r ron 115 a 9020 a o 2325 a o os 1418 a as o os 899 s 10697 Ø s 5513 Ø r s 281 Search for Candidate Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results

52 52 es 2750 Ø es 845 n 6039 Ø n 1863 Ø n r 512 Ø do n r 357 Ø da das do dos n ndo r ron 115 a 9020 a o 2325 a o os 1418 a as o os 899 s 10697 Ø s 5513 Ø r s 281 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms

53 53 an 1784 a an 1045 a an ar 417 a an ar ó 355 a ada adas ado ados an ar aron ó 148 es 2750 Ø es 845 n 6039 Ø n 1863 Ø n r 512 Ø do n r 357 Ø da das do dos n ndo r ron 115 a 9020 a o 2325 a o os 1418 a as o os 899 s 10697 Ø s 5513 Ø r s 281 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms

54 54 strado 15 rado 167 rada radas rado rados 53 rada rado rados 67 rada rado 89 ra rada radas rado rados ran rar raron ró 23 strada strado 12 strada strado stró 9 strada strado strar stró 8 strada stradas strado strar stró 7... an 1784 a an 1045 a an ar 417 a an ar ó 355 a ada adas ado ados an ar aron ó 148 es 2750 Ø es 845 n 6039 Ø n 1863 Ø n r 512 Ø do n r 357 Ø da das do dos n ndo r ron 115 a 9020 a o 2325 a o os 1418 a as o os 899 s 10697 Ø s 5513 Ø r s 281 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms

55 55 strado 15 rado 167 rada rado 89 strada strado 12... an 1784 a an 1045 es 2750 Ø es 845 n 6039 Ø n 1863 a 9020 a o 2325 s 10697 Ø s 5513 Ø r s 281 ParaMor Identify Search Cluster Filter Segment Evaluation Results Search for Candidate Paradigms Size of Search Space Huge: 2 |candidate suffixes| Most candidate suffixes have no common stems Still Exponential Greedily searched space: O(|candidate suffixes|) This example is just 0.1% of the searched space

56 56 Step 2: Clustering Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter Segment Words Using the discovered paradigms Bottom-up Agglomerative Clustering ParaMor Identify Search Cluster Filter Segment Evaluation Results

57 57 Step 3: Filtering Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter least likely candidates Segment Words Using the discovered paradigms Adapted from Harris (1955) and Goldsmith (2006) Improved over 2007 Challenge ParaMor Identify Search Cluster Filter Segment Evaluation Results

58 58 A Few of the 42 Final Paradigms 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima

59 59 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima A Few of the 42 Final Paradigms Number on Nouns

60 60 A Few of the 42 Final Paradigms 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima Number & Gender on Adjectives

61 61 A Few of the 42 Final Paradigms 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima Verbal Suffixes

62 62 The ParaMor Algorithm Identify Paradigms in 3 Steps 1.Search for candidate paradigms 2.Cluster candidates modeling the same paradigm 3.Filter Segment Words Using the discovered paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Improved over 2007 Challenge

63 63 Segment Words Using the Paradigms 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administradas ‘Feminine gender nouns under administration’ ParaMor Identify Search Cluster Filter Segment Evaluation Results

64 64 Segment Words Using the Paradigms 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administr + ad + a + s Past Participle Feminine Plural ParaMor Identify Search Cluster Filter Segment Evaluation Results

65 65 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administradas Segment Words Using the Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results

66 66 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administradasadministrada Also in corpus Segment Words Using the Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results

67 67 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administradasadministrada Segment Words Using the Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Morpheme Boundary

68 68 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administradasadministrada Segment Words Using the Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Morpheme Boundary

69 69 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administradasadministradaØ Segment Words Using the Paradigms ParaMor Identify Search Cluster Filter Segment Evaluation Results Morpheme Boundary

70 70 Segment Words Using the Paradigms 4 SuffixesØ menente mente s 11 Suffixesa amente as illa illas o or ora oras ores os 41 Suffixesa aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó 29 Suffixese edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían 20 Suffixesida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían 29 Suffixesce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco 6 SuffixesØ es idad idades mente ísima administr + ad + a + s Recovers multiple morpheme boundaries from candidate paradigms which each propose single morpheme boundaries ParaMor Identify Search Cluster Filter Segment Evaluation Results

71 71 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 52.8 ParaMor

72 72 Morfessor Baseline system for Challenge Freely available Minimum Description Length Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 47.2 52.8 MorfessorParaMor

73 73 Morfessor Baseline system for Challenge Freely available Minimum Description Length Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 47.2 52.8 MorfessorParaMor

74 74 Join ParaMor and Morfessor For each word, submit 2 analyses: a ParaMor analysis and a Morfessor analysis The Effect Oracle Recall Averaged Precision Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 47.2 52.8 MorfessorParaMor

75 75 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 47.2 52.8 56.3 MorfessorParaMorParaMor & Morfessor

76 76 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 47.2 52.8 56.3 MorfessorParaMorParaMor & MorfessorBernhard

77 77 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 47.2 47.8 52.8 44.5 56.3 Morfessor ParaMor ParaMor & MorfessorBernhard

78 78 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 47.2 47.8 52.8 44.5 56.3 54.1 Morfessor ParaMor ParaMor & Morfessor Bernhard

79 79 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 47.2 47.8 40.6 52.8 44.5 39.5 56.3 54.1 Morfessor ParaMor ParaMor & Morfessor Bernhard 48.5

80 80 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 47.2 47.8 40.6 52.8 44.5 39.5 56.3 54.1 Morfessor ParaMor ParaMor & Morfessor Bernhard 48.5

81 81 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 47.2 47.8 40.6 37.1 52.8 44.5 39.5 56.3 54.1 Morfessor ParaMor ParaMor & Morfessor Bernhard 48.5

82 82 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 47.2 47.8 40.6 37.1 52.8 44.5 39.5 46.5 56.3 54.1 Morfessor ParaMor ParaMor & Morfessor Bernhard 48.5

83 83 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 47.2 47.8 40.6 37.1 52.8 44.5 39.5 46.5 56.3 54.1 52.0 Morfessor ParaMor ParaMor & Morfessor Bernhard 48.5

84 84 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 21.9 47.2 47.8 40.6 37.1 34.0 52.8 44.5 39.5 46.5 15.4 56.3 54.1 52.0 Morfessor ParaMor P.ParaMor & Morfessor Bernhard Zeman 48.5

85 85 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 21.9 47.2 47.8 40.6 37.1 34.0 52.8 44.5 39.5 46.5 15.4 56.3 54.1 52.0 40.9 Morfessor ParaMor P.ParaMor & Morfessor Bernhard Zeman 48.5

86 86 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 21.9 47.2 47.8 40.6 37.1 34.0 52.8 44.5 39.5 46.5 15.4 56.3 54.1 52.0 40.9 Morfessor ParaMor P.ParaMor & Morfessor Bernhard Zeman 48.5 Sometimes Morfessor wins

87 87 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 21.9 47.2 47.8 40.6 37.1 34.0 52.8 44.5 39.5 46.5 15.4 56.3 54.1 52.0 40.9 Morfessor ParaMor P.ParaMor & Morfessor Bernhard Zeman 48.5 Sometimes ParaMor wins

88 88 Linguistic Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 50 30 10 EnglishGermanFinnishTurkishArabic 60.8 52.9 48.2 24.7 21.9 47.2 47.8 40.6 37.1 34.0 52.8 44.5 39.5 46.5 15.4 56.3 54.1 52.0 40.9 Morfessor ParaMor P.ParaMor & Morfessor Bernhard Zeman 48.5 ParaMor and Morfessor are Complementary

89 89 IR Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 45 35 25 EnglishGermanFinnishTurkishArabic 39.4 36.4 39.3 39.9 MorfessorParaMorParaMor & MorfessorBernhard

90 90 IR Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 45 35 25 EnglishGermanFinnishTurkishArabic 39.4 47.3 36.4 46.7 39.3 36.3 39.9 47.3 Morfessor ParaMor ParaMor & Morfessor Bernhard

91 91 IR Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 45 35 25 EnglishGermanFinnishTurkishArabic 39.4 47.3 49.2 36.4 46.7 46.8 39.3 36.3 39.7 39.9 47.3 Morfessor ParaMor ParaMor & Morfessor Bernhard 46.7

92 92 IR Evaluation ParaMor Identify Search Cluster Filter Segment Evaluation Results F1 45 35 25 EnglishGermanFinnishTurkishArabic 39.4 47.3 49.2 36.4 46.7 46.8 39.3 36.3 39.7 39.9 47.3 Morfessor ParaMor ParaMor & Morfessor Bernhard 46.7

93 93 ParaMor: State-of-the-Art Unsupervised Morphology Induction System ParaMor Identifies paradigms The organizing structure of inflectional morphology Segments words As discovered paradigms suggest Combined with Morfessor Among the best in Morpho Challenge Consistent across languages

94 94 The Next Steps for ParaMor Beyond suffixes Straightforward extension to ParaMor for Prefixes More Challenging Reduplication, Infixation, etc. Morphophonology Incorporate contextual information when clustering Improve system combination True merging of analyses Combine more systems

95 95 Thank You!

96 96


Download ppt "ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin."

Similar presentations


Ads by Google