Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY."— Presentation transcript:

1 Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY A Study of Hybrid Similarity Measures for Semantic Relation Extraction

2 Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments

3 Intelligent Database Systems Lab Motivation The quality of the relations provided by existing extractors is still lower than the quality of the manually constructed relations. Most studies are still not taking into account the whole range of existing measures, combining mostly sporadically different methods.

4 Intelligent Database Systems Lab Objectives To development of new relation extraction methods. The method is a systematic analysis of 16 baseline measures, and their combinations with 8 fusion methods and 3 techniques for the combination set selection.

5 Intelligent Database Systems Lab Methodology norm function similarity scores knn function

6 Intelligent Database Systems Lab Methodology -Single Similarity Measures Measures Based on a Semantic Network(5) – exploit the lengths of the shortest paths between terms in a network – probability of terms derived from a corpus – Wu and Palmer, Leacock and Chodorow, Resnik, Jiang and Conrath, and Lin

7 Intelligent Database Systems Lab Web-based Measures(3) – Web search engines – rely on the number of times the terms co-occur in the documents – Normalized Google Distance(NGD) – Measures of Semantic Relatedness(MSR) – YAHOO!, BING, GOOGLE over the domain wikipedia.org Methodology -Single Similarity Measures

8 Intelligent Database Systems Lab Corpus-based Measures(5) – Distributional Measures ›Bag-of-words Distributional Analysis(BDA) ›Syntactic Distributional Analysis(SDA) – Pattern-based Measure ›PatternWiki – Other Corpus-based Measures ›Latent Semantic Analysis(LSA) ›Normalized Google Distance(NGD) Methodology -Single Similarity Measures

9 Intelligent Database Systems Lab Definition-based Measures(3) – WktWiki – Gloss Vectors – Extended Lesk Methodology -Single Similarity Measures

10 Intelligent Database Systems Lab Combination Methods – Input : a set of similarity matrices{S1,..., SK} produced by K single measures – Output : a combined similarity matrix Scmb ›1. Mean ›2. Mean-Nnz ›3. Mean-Zscore ›4. Median Methodology - Hybrid Similarity Measures ›5. Max ›6. Rank Fusion ›7. Relation Fusion ›8. Logit

11 Intelligent Database Systems Lab Combination Methods – Mean. A mean of K pairwise similarity scores: – Mean-Nnz. A mean of those pairwise similarity scores which have a non-zero value: Methodology - Hybrid Similarity Measures

12 Intelligent Database Systems Lab Combination Methods – Mean-Zscore. A mean of K similarity scores transformed into Z-scores: – Median. A median of K pairwise similarities: Methodology - Hybrid Similarity Measures

13 Intelligent Database Systems Lab Combination Methods – Max. A maximum of K pairwise similarities: – Rank Fusion. Methodology - Hybrid Similarity Measures

14 Intelligent Database Systems Lab Combination Methods – Relation Fusion. – Logit. Methodology - Hybrid Similarity Measures

15 Intelligent Database Systems Lab Combination Sets – Expert choice of measures – Forward stepwise procedure – Logistic regression Methodology - Hybrid Similarity Measures

16 Intelligent Database Systems Lab Experiments Evaluation – Human Judgements Datasets. ›MC, RG, WordSim353 – Semantic Relations Datasets. ›BLESS, SN

17 Intelligent Database Systems Lab Experiments

18 Intelligent Database Systems Lab Experiments

19 Intelligent Database Systems Lab Conclusions The results have shown that the hybrid measures outperform the single measures on all datasets. A combination of 15 baseline corpus-, web-, network-, and dictionary-based measures with Logistic Regression provided the best results.

20 Intelligent Database Systems Lab Comments Advantages – higher performance Applications


Download ppt "Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY."

Similar presentations


Ads by Google