Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.

Similar presentations


Presentation on theme: "Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual."— Presentation transcript:

1 Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual International ACM SIGIR Conference priya.r@research.iiit.ac.inpriya.r@research.iiit.ac.in, romil.bansal@research.iiit.ac.in, gmanish@microsoft.com, vv@iiit.ac.inromil.bansal@research.iiit.ac.ingmanish@microsoft.comvv@iiit.ac.in Priya Radhakrishnan, Romil Bansal, Manish Gupta, Vasudeva Varma International Institute of Information Technology, Hyderabad, India SIEL@ER D

2 ERD Challenge SIEL@ERD team from IIIT, Hyderabad IIIT, Hyderabad The objective of an Entity Recognition and Disambiguation (ERD) system is to recognize mentions of entities in a given text, disambiguate them, and map them to the entities in a given entity collection[1] or knowledge base. http://www.freebas e.com/m/046yc7 SIEL@ERD

3 SIEL@ERD system TAGME[1] system with time and performance optimizations Mention detection Reduce the number of DB look-ups. Disambiguation Use (1-δ) instead of δ Prominent senses restriction Pruning SIEL@ERD

4 Data Preprocessing and Measures Inlinks are links made from anchor to wikipedia article. Indexes Process English Wikipedia dump to create three indexes 1.In-Link Graph Index 2.Anchor Dictionary 3.WikiTitlePageId Index Measures 1.link frequency link(a) 2.total frequency freq(a) 3.pages linking to anchor(a) Pg(a), 4.Prior probability Pr(p/a) 5. Link Probability lp(a), 6. Wikipedia Link-based Measure (δ)[3] SIEL@ERD

5 Optimization - Mention detection mention is any word or group of words that can potentially identify an entity. Checking every word (and word group) for DB presence, increases the number of DB look-ups. Reduce the number of mention candidates - Mention filtering methods. 1. Stopword filtering 2. Twitter POS Filtering SIEL@ERD

6 Optimization - Mention detection Mention filtering methods. SIEL@ERD 1. Stopword Filtering : If the mention identified in the given query text contains only stopwords, we ignore that mention. We use the standard JMLR stopword list. 2. Twitter POS Filtering : The query text is Part-Of-Speech (POS) tagged with a tweet POS tagger [12]. Mentions that do not contain at least one word with POS tag as NN (indicating noun) are ignored. RUNS : Run5 and Run7. Stopword filtering gave better results (F1=0.53) than TPOS Filtering (F1=0.48)

7 Optimization - Disambiguation Identify all senses of the mention and choose the right one. 1. For identical pages, the δ should be 1. So we measured Relatedness between pages as 2. Prominent senses restriction 3. Disambiguation score for a mention a from candidate sense Pa RUNS: Run3 achieved an F1 of 0.483 SIEL@ERD

8 Optimization – Pruning Identify and discard senses that are not semantically coherent SIEL@ERD Coherence is defined as the average relatedness between the given sense pa and the senses assigned to all other anchors. Pruning score combines coherence and link probability RUNS : Run6

9 Results SIEL@ERD RUN #DescriptionF1 1Base System*0.53** 2Disambiguation score uses Pr(p/a) instead of lp(a)0.50** 3Threshold Combination + Stopword Filtering + Prominent senses restriction 0.483 4Linear Combination + Non-normalized vote + single-row anchor index + Singleton Object 0.472 5TPOS Filtering0.483 6Pruning score uses lp(a) instead of Pr(p/a)0.44 7Stopword Filtering0.53 **Evaluated on 100 query set *Base System : Linear Combination + TPOS Filtering + Normalized vote + Multi-row anchor index

10 Please visit our poster SIEL@ERD Source code and Datasets : https://github.com/priyaradhakrishnan0/Entity-Recognition- and-Disambiguation-Challenge https://github.com/priyaradhakrishnan0/Entity-Recognition- and-Disambiguation-Challenge SIEL@ERD

11 References [1] D. Carmel, M.W.Chang, E. Gabrilovich, B.J.P.Hsu, K.Wang. ERD 2014: Entity Recognition and Disambiguation Challenge SIGIR Forum,2014 [2] P. Ferravina, U. Scaiella. TAGME: On-the-fly Annotation of Short Text Fragments. CIKM 2010 [3] D. Milne and I. H. Witten. An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In Proc. of the AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, 2008.


Download ppt "Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual."

Similar presentations


Ads by Google