Presentation is loading. Please wait.

Presentation is loading. Please wait.

Related terms search based on WordNet / Wiktionary and its application in ontology matching RCDL'2009 St. Petersburg Institute for Informatics and Automation.

Similar presentations


Presentation on theme: "Related terms search based on WordNet / Wiktionary and its application in ontology matching RCDL'2009 St. Petersburg Institute for Informatics and Automation."— Presentation transcript:

1 Related terms search based on WordNet / Wiktionary and its application in ontology matching RCDL'2009 St. Petersburg Institute for Informatics and Automation of RAS Feiyu Lin, A. Krizhanovsky (andrew.krizhanovsky at gmail.com) Jönköping University, Sweden

2 2 Contents Wiki and Wiktionary intro MRD, parser and Wiktionaries comparison Correlation of relatedness measures Experiment scheme Result and comparison Results, applications and future

3 Goal Is it possible to find related terms by the current version of Wiktionary as successfully as by WordNet? for ontology matching, for application in text search systems, etc. What advantages?

4 4 Wiki-resources Distributed users and authors (edit pages) Centralized storage (e.g. MySQL, Apache, PHP) Set of hyper linked articles Each article has one or more categories (tree) * Example: http://en.wikipedia.org

5 Wiktionary is a free-content multilingual dictionary

6 6 Wiktionary data: +, -, simplicity & complexity −Different wiktionaries have different levels of standartization. −Fast growing data, but it’s created by a huge community (a developed parser should be very stable) +Rich data +thesaurus (synonyms, antonyms ) +phrase books +etymologies +pronunciations +sample quotations +translations +Fast growing data +Interwiki (add. data) +GNU DFL

7 7 Wiktionary machine- readable dictionary database scheme

8 Size of Wiktionaries WordNet (2006): 150,000 words, 115,000 synsets

9

10 A shortest path in Russian Wiktionary

11 Correlation of relatedness measures Correlation with human judgments of relatedness measures 353-TC to measures based on WordNet, English Wikipedia, Russian Wiktionary

12 Largest eight Wiktionary editions (March 2008)

13 Application of Machine- readable dictionary (MRD) Thesaurus data: Related Terms Search Search request extension (by synonyms) / request reformulation (in search systems) Request recognition in question-answering systems Word sense disambiguation Media data (audio + pictures) Language learning

14 Work plan: done and todo Russian Wiktionary Extraction (by RE) –Definition –Relations (synonyms…) –Translation –Audio –Graphics Database API Visualization (MRD browser) Quiz & tests (test application) Russian Wiktionary Database scheme –Definition –Relations (synonyms…) –Translation –Audio –Graphics Database API English Wiktionary

15 15 Implementation Software based on Synarcher code Java MySQL or SQLite database JUnit test framework

16 16 Results The scheme of the experiment for calculating the semantic relatedness measure based on Russian Wiktionary data The parser of Russian Wiktionary Database scheme designed Database API implemented in Java Compared the results of related terms search based on Wiktionary and WordNet Project site (Wiki tool kit) http://code.google.com/p/wikokit/

17 Future work Finish creation MRD Database and software Russian Wiktionary and English Wiktionary Visualization (JavaFX) MRD browser Quiz & tests (learning application) Online application (Java Web-start) asdf

18 Thank you!


Download ppt "Related terms search based on WordNet / Wiktionary and its application in ontology matching RCDL'2009 St. Petersburg Institute for Informatics and Automation."

Similar presentations


Ads by Google