Presentation is loading. Please wait.

Presentation is loading. Please wait.

Word AdHoc Network: Using Google Core Distance to extract the most relevant information Presenter : Wei-Hao Huang   Authors : Ping-I Chen, Shi-Jen.

Similar presentations


Presentation on theme: "Word AdHoc Network: Using Google Core Distance to extract the most relevant information Presenter : Wei-Hao Huang   Authors : Ping-I Chen, Shi-Jen."— Presentation transcript:

1 Word AdHoc Network: Using Google Core Distance to extract the most relevant information
Presenter : Wei-Hao Huang   Authors : Ping-I Chen, Shi-Jen Lin KBS 2010

2 Outlines Motivation Objectives Methodology Experiments Conclusions
Comments

3 Motivation Most previous research methods need predictive models, which are based on the training data or Web log of the users’ browsing behaviors. Those are complexity and the keyword extraction methods are limited to certain areas.

4 Objectives To present a new algorithm called ‘‘Word AdHoc Network’’ (WANET). This method needs no pre-processing, and all the executions are real-time. To extract any keyword sequence from various knowledge domains. Document WANET System Relevant Documents

5 Methodology Word AdHoc Network System Architecture
1-gram filtering method Part-of-speech Length of the words Number of Google search results Google Core Distance Hop-by-Hop Routing algorithm PageRank algorithm BB’s graph-based clustering algorithm

6 WANET System Architecture

7 1-gram filtering method
Part-of-speech NN (common noun, singular), NP (proper noun), DT (determiner), or JJ (adjectives) Length of the words At least 3 word Number of Google search results

8 Google Core Distance The original algorithm NGD The New algorithm GCD

9 Hop-by-Hop Routing Algorithm
PageRank algorithm

10 Hop-by-Hop Routing Algorithm
BB’s graph-based clustering algorithm BB score = 1 6

11 Hop-by-Hop Routing Algorithm

12 Experiments Time variance effect of the Google search results
Execution time Precision and recall rate Top-k search results analysis Dataset: To select four knowledge domains from the Elsevier Web site, and to chose the top 25 most-downloaded papers in each journal.

13 Time variance effect of the Google search results
To use spearman’s footrule to compare the sequences that were extracted by those two algorithm.

14 Execution time

15 Precision and recall rate

16 Top-k search results analysis

17 Conclusions To propos a new system that can extract the most important keyword sequence to represent a document To help users automatically find relevant documents or Web pages. Future work To hope it can used in a mobile device or an e-book.

18 Comments Advantages Applications
To extract the most important keyword sequence. Applications Information retrieval


Download ppt "Word AdHoc Network: Using Google Core Distance to extract the most relevant information Presenter : Wei-Hao Huang   Authors : Ping-I Chen, Shi-Jen."

Similar presentations


Ads by Google