Presentation is loading. Please wait.

Presentation is loading. Please wait.

USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey.

Similar presentations


Presentation on theme: "USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey."— Presentation transcript:

1 USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

2 Problem For a given definition, find the appropriate word (or words) Traditional dictionary is of no use From a dictionary, find an appropriate word that has a “similar” definition

3 Examples User definition : Akımı ölçmek için kullanılan alet (A device that is used to measure the currenta) In the dictionary: akımölçer: elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (ammeter: a device that measures the intensity of electrical current, amperemeter) ?

4 Applications Computer-assisted language learning Solving crossword puzzles Reverse dictionary

5 Outline Problem statement Meaning-to-Word System (MTW) Our Approach Methods Results Result Summary Conclusion

6 Problem Statement Find the “similarity” between two definitions Akımı ölçmek için kullanılan alet (A device that is used to measure the current) Elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)

7 Meaning-to-Word (MTW) addresses the problem of finding the appropriate word (or words), whose meaning “matches” the given definition Two subproblems finding words whose definitions are "similar" to the query in some sense ranking the candidate words using a variety of ways

8 User Definition Search in Dictionary Rank Candidates query candidates List of words Information Flow in MTW

9 Available Resources Turkish Monolingual Dictionary About 50.000 entries Turkish WordNet About 11.000 synsets

10 User Definition Search in Dictionary Rank Candidates query candidates List of words Normalization

11 Tokenization Stemming Stop Word Elimination

12 User Definition Search in Dictionary Rank Candidates query candidates List of words Query Processing

13 Subset Generation Search with different set of words Select informative words from user’s query Query: daha önce hiç evlenmemiş kişi (a person who has never been married) {önce, evlen, kişi} (before, marry, person) {evlen, kişi}, {önce, kişi}, {önce, evlen} (marry, person) (before, person) (before, marry) {evlen}, {önce}, {kişi} (marry) (before) (person)

14 Query Processing Subset Sorting Unordered list of subsets are insufficient Rank the generated subsets 1) By the number of words {önce, evlen, kişi} (before, marry, person) {evlen, kişi} (marry, person) 2) By the sum of frequency logarithm {evlen, kişi} (marry, person) {önce, kişi} (before, person)

15 User Definition Search in Dictionary Rank Candidates query candidates List of words Searching for Meanings

16 Two methods Stem Matching Query Expansion (using WordNet)

17 Stem Matching Morphological normalization of words Find meanings that contain morphological variants of the original definition

18 Stem Matching (Ex.) (A device that is used to measure the current) { akımı ölçmek için kullanılan alet } ak (white) ölç (measure) için (to) kullan (use) alet (device) akım (current) iç (drink) kul (slave) akı (flux) Colored stems are the matching ones

19 Stem Matching (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)

20 Stem Matching (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)

21 Drawbacks Generate noisy stems ilim (science, my city)  ilim (science), il (city) Conflate two words with very different meanings to the same stem ilim (science, my city), ilde (in the city)  il (city) Cannot find relations between similar words kimse (someone) kişi (person) bölüm (part) kısım (portion) Stem Matching

22 Using Query Expansion Two different approaches: Expand query with relations (synonyms, specializations, generalizations) Expand query with unexpanded query’s relevant answers WordNet synonyms are used in MTW {besin, gıda} (food, nourishment) {iyileş, düzel} (to get better) /{iyileş, geliş} (to improve)

23 Query Expansion (Ex.) (A device that is used to measure the current) { akımı ölçmek için kullanılan alet } ak (white) ölç (measure) için (to) kullan (use) alet (device) akım (current) iç (drink) kul (slave) akı (flux) beyaz faydalan araç debi yararlan gereç akış köle

24 Query Expansion (Ex.) (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)

25 Query Expansion (Ex.) (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)

26 User Definition Search in Dictionary Rank Candidates query candidates List of words Ranking

27 Very important part of MTW Having the right answer in the retrieved set is not enough Aim is to have the right answer at top of the retrieved set (Ex: in first top 50 answers)

28 Ranking Simple but effective methods Number of matched words Subset informativeness - frequency of words in the subset Ratio of number of matched words to the number of words in the candidate dictionary definition Longest Common Subsequence - order of the matched words

29 Some Statistics Training sets: 50 queries from users 50 queries from a dictionary Test sets: 50 queries from users 50 queries from a separate dictionary Test set 1 (user) Training set 1 Test set 2 (dict.) Training set 2 # of queries 50 Avg. # of query words 5.664.649.2413.98 Max. # of query words 17122345 Min. # of query words 2116

30 RankTest set 1Training set 1 Test set 2Training set 2 1-1013 (26%)18 (36%)45 (90%)41 (82%) 11-507 (14%)12 (24%)2 (4%)5 (10%) >5019 (38%)10 (20%)3 (6%)4 (8%) Not found11 (22%)10 (20%)0 (0%) Stem Matching all stems included Low % in top 10 in user queries but very high results in dictionary queries

31 Stem Matching RankTest set 1Training set 1 Test set 2Training set 2 1-1014 (28%)21 (42%)46 (92%)43 (86%) 11-505 (10%)9 (18%)1 (2%)5 (10%) >5018 (36%)9 (18%)3 (6%)2 (4%) Not found13 (26%)11 (22%)0 (0%) longest stem included (heuristics) Improvement in user queries, slightly better performance in dictionary queries

32 Query Expansion (WordNet) RankTest set 1Training set 1 Test set 2Training set 2 1-10 14(28%)24 (48%)45 (90%)41 (82%) 11-509 (18%) 2 (4%)5 (10%) >5018 (36%)12 (24%)3 (6%)4 (8%) Not found9 (18%)5 (10%)0 (0%) all stems included Better results in user queries, no change in dictionary queries

33 Query Expansion (WordNet) RankTest set 1Training set 1 Test set 2Training set 2 1-1014 (28%)24 (48%)41 (82%)39 (78%) 11-506 (12%)8 (16%)5 (10%)6 (12%) >5021 (42%)13 (26%)1 (2%)5 (10%) Not found9 (18%)5 (10%)0 (0%) longest stem included (heuristics) Better performance than ‘longest stem matching’ in user queries, but worse performance in dictionary queries

34 Result Summary Stem Matching (longest stem included) 60% success in real user queries 96% success in dictionary queries Query Expansion (all stems included) 68% success in real user queries 92% success in dictionary queries

35 Conclusion We have implemented a ‘Meaning to Word’ system for Turkish Results on unseen data are rather satisfactory Query expansion is better Although, it cannot find the words for all queries 68% of real user queries and 90% of dictionary queries are found in the first 50 results

36 THANK YOU !


Download ppt "USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey."

Similar presentations


Ads by Google