Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented.

Similar presentations


Presentation on theme: "Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented."— Presentation transcript:

1 Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented by Dongjoo Lee, IDS Lab., CSE, SNU

2 Copyright  2009 by CEBT Introduction  Wikification Find significant topics and links them to Wiki documents. 2IDS Lab. 2009 Spring Seminar

3 Copyright  2009 by CEBT Related Work  Not restricting documents for the destination of automatically identified links Smart-Tag Service (Microsoft), AutoLink (Google) Many concerned that pages were being “surreptitiously” modified for commercial purposes Automatic linking is most successful when restricted to safe domains such as cinema (Drenner et al. 2006)  Using Wikipedia as a destination for links Wikify (Mihalcea and Csomai, 2007) – Detection involves identifying the terms and phrases from which links should be made. – Disambiguation ensures that the detected phrases link to the appropriate article.  Topic indexing Identifying the most significant topics; those which the document was written about Maron, 1977, Medelyan et al., 2008 3IDS Lab. 2009 Spring Seminar

4 Copyright  2009 by CEBT Learning to Link with Wekipedia  Learning to disambiguate links  Learning to detect links  Wikification in the wild  Examples and implications  Conclusions 4IDS Lab. 2009 Spring Seminar

5 Copyright  2009 by CEBT Learning to disambiguate links - commonness  balancing the commonness of a sense with its relatedness to the surrounding context  commonness (prior probability) : the number of times a wiki document is used as a destination in Wikipedia 5IDS Lab. 2009 Spring Seminar

6 Copyright  2009 by CEBT Learning to disambiguate links - relatedness 6IDS Lab. 2009 Spring Seminar  Comparing each possible sense with its surrounding context Words consisting context also may be ambiguous Use un ambiguous words that has only one sense – ex) algorithm, uniformed search, LIFO stack Reduced to selecting the sense article that has most in common with all of the context articles a,b: articles of interest A, B: sets of all articles that link to a and b W: a set containing all articles in Wikipedia  some context terms are better than others

7 Copyright  2009 by CEBT Training – Configuration – Test 7IDS Lab. 2009 Spring Seminar Training Set (500) Training Set (500) Configuration Set (500) Configuration Set (500) Test Set (100) Test Set (100) Training Configuration Test find an optimal classifier and variables TrainingEvaluation  precision  recall  f-measure

8 Copyright  2009 by CEBT Learning to disambiguate links – configuration and attribute selection  identifying the most suitable classification algorithm  setting minimum probability of senses that are considered by the algorithm reduce the required time to compare relatedness between context and candidate senses 8IDS Lab. 2009 Spring Seminar

9 Copyright  2009 by CEBT Learning to disambiguate links - evaluation 9IDS Lab. 2009 Spring Seminar

10 Copyright  2009 by CEBT Learning to detection links  Naïve approach (Mihalcea and Csomai 2008) If probability that a word or phrase had been linked to an article exceeds a certain threshold, a link is attached to it  Presented approach Machine learning link detector that uses various features – Link probability – Relatedness – Disambiguation confidence – Generality: the minimum depth at which it is located in Wikipedia’s category tree – Location and Spread first occurrence, last occurrence, spread (distance between them) 10IDS Lab. 2009 Spring Seminar

11 Copyright  2009 by CEBT Learning to detection links (cont’d) 11IDS Lab. 2009 Spring Seminar

12 Copyright  2009 by CEBT Learning to detection links - training and configuration, and evaluation 12IDS Lab. 2009 Spring Seminar

13 Copyright  2009 by CEBT Wikification in the wild  Experimental data subset of 50 documents from the AQUAINT  Participants and tasks Mechanical Turk (Barr and Cabrera, 2006) – a crowd sourcing service hosted by Amazon provides a way for human judgment to be easily incorporated into software applications  Results 13IDS Lab. 2009 Spring Seminar

14 Copyright  2009 by CEBT Examples and implications 14IDS Lab. 2009 Spring Seminar

15 Copyright  2009 by CEBT Conclusion  The present paper’s contribution is a proven method of extracting key concepts from plain text that has been evaluated against an extensive body of human performance 15IDS Lab. 2009 Spring Seminar

16 Copyright  2009 by CEBT Discussion  well written clear motivation and contribution clear presentation about the method they have done in order to accomplish their goal  but not much new idea combination of existing features that are frequently used for text classification and so on 16IDS Lab. 2009 Spring Seminar


Download ppt "Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented."

Similar presentations


Ads by Google