Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin."— Presentation transcript:

1 Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin University of Science and Technology Domain-Specific Web Search with Keyword Spices Knowledge and Data Engineering, IEEE Transactions on, Jan. 2004,IEEE JNL

2 Intelligent Database Systems Lab Outline Motivation Objective Introduction Domain-specific web search with keyword spices Algorithm for extracting keyword spices Experiments Conclusions Opinion N.Y.U.S.T. I.M.

3 Intelligent Database Systems Lab N.Y.U.S.T. I.M. Motivation naïve queries may find many irrelevant pages obtain more relevant pages depend on much experience and skill previous, domain-specific collect and index relevant page manually constructed: cost, scalable

4 Intelligent Database Systems Lab Objective Domain-specific search engines return: relevant to certain domains filter irrelevant web pages N.Y.U.S.T. I.M.

5 Intelligent Database Systems Lab 1-1.Introduction Domain-specific web search engines Looking for a recipe Only input ‘beef’, find few recipes Input ‘beef pepper’, find other recipes N.Y.U.S.T. I.M.

6 Intelligent Database Systems Lab N.Y.U.S.T. I.M. 牛肉 牛肉、胡椒

7 Intelligent Database Systems Lab 1-2.Introduction N.Y.U.S.T. I.M.

8 Intelligent Database Systems Lab 1-3.Introduction N.Y.U.S.T. I.M.

9 Intelligent Database Systems Lab 1-4.Introduction N.Y.U.S.T. I.M. Domain-specific search engines return: relevant to certain domains filter irrelevant web pages download irrelevant and relevant, classify them Use Decision-Tree

10 Intelligent Database Systems Lab 2-1.Domain-Specific web search with keyword spices Domain-Specific Web search as a Text Classification problem Domain-Specific which collect sample web pages according to the assumption of user’s input N.Y.U.S.T. I.M.

11 Intelligent Database Systems Lab 2-1. Domain-specific web search as a text classification D : all web documents Dt: the set of documents relevant to a certain domain N.Y.U.S.T. I.M.

12 Intelligent Database Systems Lab 2-1. Domain-specific web search as a text classification set of all keywords in the domain be the hypothesis space composed of all Boolean expressions is regarded as a Boolean variable A Boolean expression of keywords can be regarded as a function from D to 1, keywords is contained in the document 0, otherwise N.Y.U.S.T. I.M. Words in domain- specific out put 1110001 2010110 3011001

13 Intelligent Database Systems Lab 2-1. Domain-specific web search as a text classification Finding hypothesis h that minimizes the error rate: N.Y.U.S.T. I.M.

14 Intelligent Database Systems Lab 2-2.Collecting sample web pages by user’s input It’s difficult with random sampling. assume all candidates keyword have the same probability of occurrence in the “recipe domain”, input “beef,” “salmon( 鮭魚 ),” “potato,” etc. as sample keywords and download the same web pages for each keyword N.Y.U.S.T. I.M.

15 Intelligent Database Systems Lab 2-2.Collecting sample web pages by user’s input N.Y.U.S.T. I.M.

16 Intelligent Database Systems Lab 3-1.Identifying keyword spices N.Y.U.S.T. I.M. classify sample pages into two classes T or F by hand a decision tree learning algorithm to discover keyword spices each node is an attribute value of a branch indicates the value of the attribute each leaf is a class No “tablespoon”, has “recipe”, no “home”, no “top, class T

17 Intelligent Database Systems Lab 3-1. Extracting keyword spices N.Y.U.S.T. I.M. Words in domain-specificoutput d1110001 d2010110 d3011001 Classified by humans Web pages collected by user’s input keyword

18 Intelligent Database Systems Lab 3-1.Identifying keyword spices N.Y.U.S.T. I.M.

19 Intelligent Database Systems Lab 3-2.Simplifying keyword spices Decision trees are very large. Too-complex queries can’t be accepted. overfitting problem N.Y.U.S.T. I.M.

20 Intelligent Database Systems Lab 3-2.Simplifying keyword spices Simplify the induced Boolean expression 1.For each conjunction c in h we remove keywords (Boolean literals) from c to simplify. 2.We remove conjunctions from disjunctive normal from h to simplify it. N.Y.U.S.T. I.M.

21 Intelligent Database Systems Lab 3-2.Simplifying keyword spices Precision P and recall R are defined over validation Harmonic mean of P and R N.Y.U.S.T. I.M.

22 Intelligent Database Systems Lab 3-2.Simplifying keyword spices greater contribution to F weighted harmonic mean of F N.Y.U.S.T. I.M.

23 Intelligent Database Systems Lab 4.Experimtents N.Y.U.S.T. I.M.

24 Intelligent Database Systems Lab 4-1.Experimtents-extracting keyword spices N.Y.U.S.T. I.M.

25 Intelligent Database Systems Lab 4-1.Experimtents-extracting keyword spices N.Y.U.S.T. I.M.

26 Intelligent Database Systems Lab 4-1.Extracting keyword spices sample pages were split randomly in the recipe domain N.Y.U.S.T. I.M.

27 Intelligent Database Systems Lab keyword spices discovered for a recipe search engines N.Y.U.S.T. I.M. 4-1.Extracting keyword spices

28 Intelligent Database Systems Lab trade off between precision and recall N.Y.U.S.T. I.M. 4-1.Extracting keyword spices

29 Intelligent Database Systems Lab When, keyword spices extracted for the domain of … N.Y.U.S.T. I.M. 4-1.Extracting keyword spices

30 Intelligent Database Systems Lab N.Y.U.S.T. I.M. 4-2.Evluation Using a General- Purpose search engine

31 Intelligent Database Systems Lab to test queries in each domain N.Y.U.S.T. I.M. 4-2.Evluation Using a General- Purpose search engine

32 Intelligent Database Systems Lab N.Y.U.S.T. I.M. 4-2.Evluation Using a General- Purpose search engine

33 Intelligent Database Systems Lab precision values of the sample queries conjoined with “recipe” keyword “recipe” finds fewer relevant than the query with keyword spice, for example: “beef recipe” N.Y.U.S.T. I.M. 4-2.Evluation Using a General- Purpose search engine

34 Intelligent Database Systems Lab N.Y.U.S.T. I.M. 4-3.Comparison to the Filtering model

35 Intelligent Database Systems Lab precision values of the sample queries in the filtering model N.Y.U.S.T. I.M. 4-3.Comparison to the Filtering model

36 Intelligent Database Systems Lab numbers of relevant pages returned by the … N.Y.U.S.T. I.M. 4-3.Comparison to the Filtering model

37 Intelligent Database Systems Lab for example “shrimp”, must download 5 pages to obtain one result and so is quite inefficient N.Y.U.S.T. I.M. 4-3.Comparison to the Filtering model

38 Intelligent Database Systems Lab 5.Future Work training examples classified by human cost N.Y.U.S.T. I.M.

39 Intelligent Database Systems Lab 5.Future Work 1. Using a Web Directory as a Source for Training examples Web directories such as Yahoo, Open Directory,…,… estimate bias N.Y.U.S.T. I.M.

40 Intelligent Database Systems Lab 5.Future Work 2. Learning Classifiers from Partially Labeled Data Proposed an algorithm augment a small to huge N.Y.U.S.T. I.M.

41 Intelligent Database Systems Lab 6.Conclusion keyword spices human Cost, effective N.Y.U.S.T. I.M.

42 Intelligent Database Systems Lab Opinion dependent on human seriously assume all candidates keyword have the same probability of occurrence …… N.Y.U.S.T. I.M.

43 Intelligent Database Systems Lab Opinion Pr(TL)? Pr(TL’)? N.Y.U.S.T. I.M.

44 Intelligent Database Systems Lab Opinion Poster Probability Rule X N.Y.U.S.T. I.M. assume all candidates keyword have the same probability of occurrence

45 Intelligent Database Systems Lab N.Y.U.S.T. I.M.

46 Intelligent Database Systems Lab N.Y.U.S.T. I.M.

47 Intelligent Database Systems Lab N.Y.U.S.T. I.M.

48 Intelligent Database Systems Lab N.Y.U.S.T. I.M.

49 Intelligent Database Systems Lab N.Y.U.S.T. I.M.

50 Intelligent Database Systems Lab N.Y.U.S.T. I.M.

51 Intelligent Database Systems Lab N.Y.U.S.T. I.M.

52 Intelligent Database Systems Lab N.Y.U.S.T. I.M.

53 Intelligent Database Systems Lab N.Y.U.S.T. I.M.

54 Intelligent Database Systems Lab N.Y.U.S.T. I.M.

55 Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen 國立雲林科技大學 National Yunlin University of Science and Technology Keyword Spices Modified

56 Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen 國立雲林科技大學 National Yunlin University of Science and Technology Information Retrieval

57 Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen 國立雲林科技大學 National Yunlin University of Science and Technology Machine Learning (cluster,classify)

58 Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen 國立雲林科技大學 National Yunlin University of Science and Technology Content Web Mining

59 Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen 國立雲林科技大學 National Yunlin University of Science and Technology Dictionary which can represent a distance between Words

60 Intelligent Database Systems Lab Advisor : Graduate : Chien-Shing Chen 國立雲林科技大學 National Yunlin University of Science and Technology


Download ppt "Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin."

Similar presentations


Ads by Google