EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.

Slides:



Advertisements
Similar presentations
Toward Scalable Keyword Search over Relational Data Akanksha Baid, Ian Rae, Jiexing Li, AnHai Doan, and Jeffrey Naughton University of Wisconsin VLDB 2010.
Advertisements

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.
ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,
Efficient Type-Ahead Search on Relational Data: a TASTIER Approach Guoliang Li 1, Shengyue Ji 2, Chen Li 2, Jianhua Feng 1 1 Tsinghua University, Beijing,
Xyleme A Dynamic Warehouse for XML Data of the Web.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
The Last Lecture Agenda –1:40-2:00pm Integrating XML and Search Engines—Niagara way –2:00-2:10pm My concluding remarks (if any) –2:10-2:45pm Interactive.
Keyword Proximity Search on XML Graphs Vagelis Hristidis Yannis Papakonstatinou Andrey Presenter: Feng Shao.
XSEarch: A Semantic Search Engine for XML Sara Cohen Jonathan Mamou Yaron Kanza Yehoshua Sagiv Presented at VLDB 2003, Germany.
A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.
Information Retrieval
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
PageRank for Product Image Search Yushi Jing, Shumeet Baluja College of Computing, Georgia Institute of Technology Google, Inc. WWW 2008 Referred Track:
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber Max-Planck-Institut für Informatik CIDR 2007)
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Querying Structured Text in an XML Database By Xuemei Luo.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber CIDR 2007) Conference on Innovative Data Systems.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Keyword Query Routing.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Query Segmentation Using Conditional Random Fields Xiaohui and Huxia Shi York University KEYS’09 (SIGMOD Workshop) Presented by Jaehui Park,
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
1 One Table Stores All: Enabling Painless Free-and-Easy Data Publishing and Sharing Bei Yu 1, Guoliang Li 2, Beng Chin Ooi 1, Li-zhu Zhou 2 1 National.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center,
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
Measuring the Structural Similarity of Semistructured Documents Using Entropy Sven Helmer University of London, Birkbeck VLDB’07, September 23-28, 2007,
GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Construction of Optimal Data Aggregation Trees for Wireless Sensor Networks Deying Li, Jiannong Cao, Ming Liu, and Yuan Zheng Computer Communications and.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,
Structured-Value Ranking in Update- Intensive Relational Databases Jayavel Shanmugasundaram Cornell University (Joint work with: Lin Guo, Kevin Beyer,
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Outline Introduction State-of-the-art solutions Equi-Truss Experiments
Multimedia Information Retrieval
Finding Story Chains in Newswire Articles
Toshiyuki Shimizu (Kyoto University)
MCN: A New Semantics Towards Effective XML Keyword Search
Bidirectional Query Planning Algorithm
Information Retrieval and Web Design
Introduction to XML IR XML Group.
Presentation transcript:

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, Lizhu Zhou Tsinghua University SIGMOD Summarized by Jaehui Park, IDS Lab., Seoul National University Presented by Jaehui Park, IDS Lab., Seoul National University

Copyright  2008 by CEBT INTRODUCTION  Keyword search capability into text documents, XML documents, and relational databases  Graph index Instead of traditional inverted index – Effective for unstructured data – Inadequate for complex structural information.  EASE (Efficient and Adaptive keyword Search method) Efficient algorithmic basis for scalable top-k-style processing of large amounts of heterogeneous data – Employing and adaptive, efficient and novel index 2

Copyright  2008 by CEBT Contributions  Model for unstructured, semi-structured and structured data as graphs  Effective graph index as opposed to the inverted index  Novel ranking mechanism for both DB and IR viewpoint  Extensive performance study 3

Copyright  2008 by CEBT Motivation  Unstructured Link awareness – Relevant data may be separated into different pages but linked through hyperlinks  (Semi-) Structured LCA (Lowest common ancestors) – Connected tree with minimal cost Ex) Steiner trees 4

Copyright  2008 by CEBT r-Radius Steiner Graph Problem  Meaningful Steiner graphs with acceptable sizes  Several concepts Centric distance Radius r-Radius Steiner tree – Radius of a Steiner graph cannot be larger than r 5

Copyright  2008 by CEBT Example  DBLP example 6

Copyright  2008 by CEBT The r-Radius Seiner Graph Problem  Given a graph and an input keyword query K, the r-Radius Seiner Graph Problem is to find all the r-radius Steiner graphs in, which contain all or a portion of the input keywords in K, ranked by relevancy with K. 7

Copyright  2008 by CEBT EASE: An adaptive search method  Inverted indices are not effective for discovering the much richer structural relationships existing in databases with complicated structured [10]. Index r-radius Steiner graphs for each combination – Very expensive  Proposed method 1. Discover r-radius graphs (indexing) 2. Extracting r-radius Steiner graphs (on the fly) – By removing non-Steiner nodes 8

Copyright  2008 by CEBT EASE: An adaptive search method  Adjacency Matrix Extracting r-radius graphs effectively 9

Copyright  2008 by CEBT EASE: An adaptive search method  Determining the subgraph that are r-radius graphs By Lemma 1. For efficient retrieval of r-radius graphs – Graph index r-radius graph that contain query keywords k  Extracting r-radius Steiner graphs By Theorem 1. 10

Copyright  2008 by CEBT EASE: An adaptive search method  Computing the Steiner nodes 11

Copyright  2008 by CEBT EASE: An adaptive search method  Maximal r-Radius Graph Avoid redundancy – Keep the maximal r-radius graphs in the graph index Overlapping graphs  Graph partitioning Avoid the incurrence of huge storage Only need to retrieve the corresponding relevant graph partitions Graph similarity – Bigger overlap -> higher similarity 12

Copyright  2008 by CEBT Summary  1. Obtain adjacency matrix M  2. Compute M r  3. Extract the maximal r-radius graphs  4. Cluster the graphs by employing the existing K-means algorithm and partition the graph  5. Construct the graph index to materialize the maximal r-radius graphs 13

Copyright  2008 by CEBT Others  Ranking Functions TF-IDF based IR-ranking Structural Compactness-based DB Ranking – Intuitively, when an r-radius Steiner graph SG is more compact, SG is more likely to be meaningful and relevant.  Indexing 14

Copyright  2008 by CEBT Experimental study  Dataset: DBLife, DBLP and IMDB  Comparison Unstructured – InfoUnit [18] Semi-structured – SLCA [28] Structured – DPBF [6] 15

Copyright  2008 by CEBT Experimental study 16

Copyright  2008 by CEBT Experimental study 17

Copyright  2008 by CEBT Conclusion  Proposed an efficient and adaptive keyword search method EASE – Keyword queries over unstructured, semi-structured and structure data  Examined the issues of indexing and ranking By taking into account both the structural compactness  Experimental results shows that EASE achieves both high search efficiency and quality for keyword search over heterogeneous data. 18