Presentation is loading. Please wait.

Presentation is loading. Please wait.

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

Similar presentations


Presentation on theme: "EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al."— Presentation transcript:

1 EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

2 The Problem Keyword search introduces false positives Keyword search introduces false positives i.e.: “Conference 2008 Canada Data Integration”

3 The Problem Websites are organized through content Websites are organized through content “Dr Pain, Math 343, Linear Algebra”

4 The Solution Combine linked pages for search, ordered by ranking

5 The Solution r-Radius Steiner Graph Problem r-Radius Graph Centric Distance: shortest path Radius: minimal centric distance v u t r s

6 The Solution r-Radius Steiner Graph Problem Content node: Contains a keyword Steiner node: Two content nodes u t r “Dr Pain” “Math 343” v s

7 r-Radius Steiner Graph on search Example: Example:

8 r-Radius Steiner Graph on search

9 The graph model for the publication database

10 Adjacency Matrix

11 Finding r-Radius Graphs Query: “Shanmugasundaram, Guo, XRANK” Query: “Shanmugasundaram, Guo, XRANK”

12 Avoiding Overlapping Maximal r-Radius Graph Maximal r-Radius Graph It is not contained in another r-Radius subgraph It is not contained in another r-Radius subgraph But wait! There is still overlap But wait! There is still overlap No problem: No problem: Graph Clustering Graph Clustering Graph Partitioning Graph Partitioning

13 Graph Clustering

14 Ranking TF-IDF-based IR ranking (tf,idf,ndl) is ok TF-IDF-based IR ranking (tf,idf,ndl) is ok Better yet: structural compactness-based DB ranking (SIM) Better yet: structural compactness-based DB ranking (SIM) More compact more relevant More compact more relevant Length of path inversely proportional to ranking Length of path inversely proportional to ranking

15 Indexing IR score and Sim score are combined IR score and Sim score are combined An inverted index (EI-Index) is created An inverted index (EI-Index) is created The inverted index stores keyword pairs and scores The inverted index stores keyword pairs and scores

16 Experiments

17 Results

18 Results

19 Results

20 Results

21 Strengths of the Paper Very well written paper Very well written paper Deep research on the topic Deep research on the topic Mathematical based and proved Mathematical based and proved Baseline with current methods Baseline with current methods Good results Good results

22 Weakness and Future Work It might be too complex It might be too complex Could work on ways to find Steiner graphs faster Could work on ways to find Steiner graphs faster It doesn’t consider cases of farming sites or bogus sites It doesn’t consider cases of farming sites or bogus sites

23 Questions?


Download ppt "EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al."

Similar presentations


Ads by Google