Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction."— Presentation transcript:

1 Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction via Topic Decomposition

2 Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments 1

3 Intelligent Database Systems Lab Motivation Existing graph-based ranking methods for keyphrase extraction just compute a single importance score for each word via a single random walk. Motivated by the fact that both documents and words can be represented by a mixture of semantic topics. 2

4 Intelligent Database Systems Lab Objectives We thus build a Topical PageRank (TPR) on word graph to measure word importance with respect to different topics. we further calculate the ranking scores of words and extract the top ranked ones as keyphrases. 3

5 Intelligent Database Systems Lab Methodology-Building Topic Interpreters 1 α, β from: ex: Gibbs sampling Pr(w|z) ∈ ϕ(z) ∈ ϕ θ Pr(z|d) ∈ θ (d) ∈ θ Document-topic Topic-word LDA output:

6 Intelligent Database Systems Lab Methodology- Topical PageRank for Keyphrase Extraction 1

7 Intelligent Database Systems Lab Methodology- Constructing Word Graph Slide window size = 3 The document is regarded as a word sequence 1

8 Intelligent Database Systems Lab Methodology- Topical PageRank(PageRank) Define: weight of link (wi,wj) as e(wi,wj) 1

9 Intelligent Database Systems Lab Methodology- Topical PageRank(PageRank) out-degree of vertex equal probabilities of random jump to all vertices. 1

10 Intelligent Database Systems Lab Methodology- Topical PageRank From LDA 1 =pr(w)*pr(z)/pr(z) focuses on word =pr(z)*pr(w)/pr(w) focuses on topic (Cohn and Chang, 2000).

11 Intelligent Database Systems Lab Methodology- Extract Keyphrases Using Ranking Scores 1 Step1. annotate the document with POS tags. Step2. select noun phrases. Step3. compute the ranking scores of candidate keyphrases separately for each topic. PageRank Topic PageRank Step4. integrate topic-specific rankings of candidate keyphrases into a final ranking.

12 Intelligent Database Systems Lab Experiment- Datasets Dataset: 1 Article keyphrases NEWS 3082488 RESEARCH200019254 Topic model: build topic interpreters with LDA. corpusWeb page wordtopic Wikipedia snapshot at March 2008 21226182000050 to 1500

13 Intelligent Database Systems Lab Experiment- Evaluation Metrics 1 However, precision/recall/F-measure does not take the order of extracted keyphrases into account. The large value is better than small values. The values is between 0 and 1.

14 Intelligent Database Systems Lab Experiment- Influences of Parameters to TPR 1 Window Size W The Number of Topics K

15 Intelligent Database Systems Lab Experiment - Influences of Parameters to TPR 1 Damping Factor λ Preference Values =pr(w)*pr(z)/pr(z) focuses on word =pr(z)*pr(w)/pr(w) focuses on topic Ex.he 、 she

16 Intelligent Database Systems Lab Experiment - Comparing with Baseline Methods 1 do not use topic information TPR enjoys the advantages of both LDA and TFIDF/PageRank

17 Intelligent Database Systems Lab Experiment - Extracting Example 1

18 Intelligent Database Systems Lab Conclusions Experiments on two datasets show that TPR achieves better performance than other baseline methods. 1

19 Intelligent Database Systems Lab Comments Advantages – TPR incorporates topic information within random walk for keyphrase extraction. Applications – Automatic Keyphrase Extraction. 1


Download ppt "Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction."

Similar presentations


Ads by Google