Presentation is loading. Please wait.

Presentation is loading. Please wait.

A knowledge rich morph analyzer for Marathi derived forms Ashwini Vaidya IIIT Hyderabad.

Similar presentations


Presentation on theme: "A knowledge rich morph analyzer for Marathi derived forms Ashwini Vaidya IIIT Hyderabad."— Presentation transcript:

1 A knowledge rich morph analyzer for Marathi derived forms Ashwini Vaidya IIIT Hyderabad

2 Need for Morphological analysis Basic information about a word’s category, gender, number etc. is provided by morph analysis Required for Machine Translation tasks Necessary for building part-of-speech taggers Accurate tools are especially required for languages that are morphologically rich

3 Inflectional and Derivational forms To begin with, morph analysis concentrates on inflectional forms. Inflection more regular and productive. Eg. A plural affix would attach to almost all nouns, but a derivational affix like –ness only to a few Criteria of attachment is more difficult to determine for a derivational affix

4 Computational analysis of derived forms Previous approaches have used strategies such as  Creation of suffix table (Hoeppner, 1982)  Identifying morphologically ‘active’ bases (Byrd, 1986)  Using an extensive semantic ontology (Woods, 2000) Statistical approaches have focused on automatic acquisition of morphology (eg. Sharma et al for Assamese)

5 Productivity of Derivational suffixes Survey of some noun-forming affixes in the CIIL Marathi corpus showed how some occur more frequently than others Analysis of such suffixes would capture some linguistic knowledge  -pə ɳ a, - ɪ kə, -t ̪ a, -i ː, attach more freely  Suffixes like - ɪ kərə ɳ ə, -g ɪ ri, -ə ɳ ə are less frequent

6 Marathi morph analysis Existing Morph analyzer by Akshar Bharti 114 paradigms for nouns, verbs, pronouns, adjectives Derivational and inflectional processes operate together, hence both kinds of knowledge needed Open source tool Lttoolbox allows for easy conversion/creation of new paradigms

7 Building a morphological dictionary The Lttoolbox tool requires the creation of a set of correspondences between Surface Forms and Lexical forms  Surface forms (SF) : forms that have undergone some morphological process  Lexical forms (LF) : base forms of the words, entered in the dictionary Regularities in this correspondences form paradigms Morph analysis will take SF as input and return LF as the output Generation, i.e. vice versa is also possible

8 Sample paradigm A yAlA A Dictionary entry: kacar

9 Adding knowledge about derivational suffixes The sample paradigm given below is used to call another paradigm containing information about the derivational suffix [ lahAna=ləhanə, small, adj]

10 Nested paradigm The paNA paradigm is ‘called’ from the previous one: paNA paNA > ”/> paNAne paNA >

11 Sample Output lahAna/lahAna lahanapaNA/lahAnapaNA lahAnapaNAne/lahAnapaNA

12 More features Possible to call more than one paradigm at a time.  Example, lahAna can take -paNA or –paNa

13 Present Work The morphological dictionary consists of 10 derivational suffixes in Marathi 38 derivational paradigms Total number of forms generated: 450,000 Preliminary evaluation over a set of 200 derived forms taken from a corpus shows 32% coverage

14 Problems Coverage can be improved if the following issues can be handled:  Prefixes: needs further processing  Cases of ‘Vriddhi’ cannot be handled well using paradigms. Example: pə ʋ it ̪ rə+yə =pa ʋ it ̪ ryə (pure + suf = purity)  Emphatic particles like –hI and -ca Some noun forming suffixes like –Ne or –ArI are highly regular, hence better handled using an inflectional paradigm

15 Future work Aim at increasing coverage by addition of more suffixes Test the possibility of using ‘Metadix’ for handling cases of vowel lengthening

16 Download and documentation for Lttoolbox:   SourceForge


Download ppt "A knowledge rich morph analyzer for Marathi derived forms Ashwini Vaidya IIIT Hyderabad."

Similar presentations


Ads by Google