Presentation is loading. Please wait.

Presentation is loading. Please wait.

A review for Information Retrieval Subject :. GROUP MEMBERS AHMAD KAMAL HARIDAN JAJULI P61037 NADIA BINTI KAMARUDIN P61026 ZURINA BINTI ZOLKAFFLY P61066.

Similar presentations


Presentation on theme: "A review for Information Retrieval Subject :. GROUP MEMBERS AHMAD KAMAL HARIDAN JAJULI P61037 NADIA BINTI KAMARUDIN P61026 ZURINA BINTI ZOLKAFFLY P61066."— Presentation transcript:

1 A review for Information Retrieval Subject :

2 GROUP MEMBERS AHMAD KAMAL HARIDAN JAJULI P61037 NADIA BINTI KAMARUDIN P61026 ZURINA BINTI ZOLKAFFLY P61066

3 Stemming algorithm : computational procedure that will reduce all the inflectional derivational variants of words to a common form called the stem Removing all or some of the affixes attached to the word. Example : group,groups,grouped group

4 developed based on Rules Application Order (RAO) approach. adding a few appropriate affixes into the list of rules, modifications of the spelling variations rules adding a few missing words into the dictionary of root sorting in decreasing order according to the frequency of rules usage in previous stemming.

5 Infix ElEmErin Prefix - suffix ber…anber…kandi…iKe…anMen…iMen…kanMemper…iPen…anPer…anSe…nya Suffix IAnKanlahnya Prefix diKeSeVerMenPenperter

6 PREFIX + + SUFFIX PREFIX + SUFFIX +INFIX+

7 ExperimentEvaluateSummary Source of translation : Quranic Collection

8 RFO List of Affixes rules Spelling Variation rules - Ahmad Root word dictionary SISDOM98 Stop words -Ahmad

9 Test 1 = pr – ps – su – in Test 2 = pr – su – ps – in Test 3 = ps - pr – su – in Test 4 = ps – su – pr – in Test 5 = su – pr – ps – in Test 6 = su – ps – pr – in Test 7 = alphabetical Legend : pr = Prefix ps = Prefix – Suffix su = Suffix in = Infix alphabetical = the alphabetical order of all rules

10 RAO Ahmad Algorithm Dict.Root word Spelling Variation List of Affixes RAO2 Ahmad Algorithm Modified Dict. Root word Modified Speeling Variation Modified list of rules NRAO Modified Algorithm RAO2 Dict root word RAO2 spelling variation RAO2 list of rules RFO NRAO2 funtionality Sort d creasing order of frequent rules.

11

12

13

14 Minimum stem length after the removal of affix Prefix & suffix min. Length =2 Prefix – Suffix & Infix min. Length = 3 Quantitative Spelling rules – spelling exception & variation Handle by program because of complexity Apply for Prefix, Prefix- Suffix & Suffix First letters of root words need to be dropped when combined with these Affixes ReCoding

15 Prefixes Suffix * Sample notation rules : Men + c, d, sy, t, z

16 Step 1 Get the next word until the last word Step 2 Check the word against the dictionary; if it appears in the dictionary, the word is the root word and goto Step-1; Step 3 Step-3: Get the next rule; if no more rules available, the word is considered as a root word and goto Step-1;

17 Step 4 Step-4: Apply the rule on the word to get a stem; Step 5 Perform recoding for prefix spelling exceptions and check the dictionary; Step 6 If the stem appears in the dictionary, the stem is the root of the word and goto Step-1; else goto Step-7;

18 Step 7 Check the stem from Step-4 for spelling variations and check the dictionary; Step 8 If the stem appears in the dictionary, the stem is the root of the word and goto Step-1; else goto Step-9; Step 9 Perform recoding for suffix spelling exceptions and check the dictionary; Step 10 If the stem appears in the dictionary, the stem is the root of the word and goto Step-1; else goto Step-3;

19 Compression Achived Reduce Error StemmerDistinct WordCompression RAO266761.4% RFO260262.3% RFO is an improvement because it returns less distinct words and higher compression percentage StemmerNumber of errors Percentage of errors RAO934.4% RFO301.4% RFO also recorded the least amount of errors

20 From the experiments performed, it is found that : - The order of rules to use is not necessary to follow any order of affixes types. -Let the rules sorted in alphabetical order for the first pass, and for the second pass, sort the rules according to usage frequency of each rule. - Experiments showed that the new approaches in stemming are better than other Malay stemmer as RAO by Ahmad.

21


Download ppt "A review for Information Retrieval Subject :. GROUP MEMBERS AHMAD KAMAL HARIDAN JAJULI P61037 NADIA BINTI KAMARUDIN P61026 ZURINA BINTI ZOLKAFFLY P61066."

Similar presentations


Ads by Google