SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.

Slides:



Advertisements
Similar presentations
Toward Scalable Keyword Search over Relational Data Akanksha Baid, Ian Rae, Jiexing Li, AnHai Doan, and Jeffrey Naughton University of Wisconsin VLDB 2010.
Advertisements

Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
13/04/20151 SPARK: Top- k Keyword Query in Relational Database Wei Wang University of New South Wales Australia.
Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha ( )
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
1 Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.
Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
Efficient Type-Ahead Search on Relational Data: a TASTIER Approach Guoliang Li 1, Shengyue Ji 2, Chen Li 2, Jianhua Feng 1 1 Tsinghua University, Beijing,
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Suggestion of Promising Result Types for XML Keyword Search Joint work with Jianxin Li, Chengfei Liu and Rui Zhou ( Swinburne University of Technology,
Circumventing Data Quality Problems Using Multiple Join Paths Yannis Kotidis, Athens University of Economics and Business Amélie Marian, Rutgers University.
EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
Computer Science and Engineering Loyalty-based Selection: Retrieving Objects That Persistently Satisfy Criteria Presented By: Zhitao Shen Joint work with.
Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber Max-Planck-Institut für Informatik CIDR 2007)
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Probabilistic Ranking of Database Query Results Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Querying Structured Text in an XML Database By Xuemei Luo.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber CIDR 2007) Conference on Innovative Data Systems.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
RANKING SUPPORT FOR KEYWORD SEARCH ON STRUCTURED DATA USING RELEVANCE MODEL Date: 2012/06/04 Source: Veli Bicer(CIKM’11) Speaker: Er-gang Liu Advisor:
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
Noun-Phrase Analysis in Unrestricted Text for Information Retrieval David A. Evans, Chengxiang Zhai Laboratory for Computational Linguistics, CMU 34 th.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Query Segmentation Using Conditional Random Fields Xiaohui and Huxia Shi York University KEYS’09 (SIGMOD Workshop) Presented by Jaehui Park,
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Flickr Tag Recommendation based on Collective Knowledge BÖrkur SigurbjÖnsson, Roelof van Zwol Yahoo! Research WWW Summarized and presented.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center,
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
TT-Join: Efficient Set Containment Join
Structure and Content Scoring for XML
Xu Zhou Kenli Li Yantao Zhou Keqin Li
Structure and Content Scoring for XML
Presentation transcript:

SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD Summarized by Jaehui Park, IDS Lab., Seoul National University Presented by Jaehui Park, IDS Lab., Seoul National University

Copyright  2009 by CEBT Introduction  Demand for RDB to support effective and efficient IR-style keyword queries Features – Assembling data collectively – Supporting casual users – Revealing unexpected relationships among entities – More flexible search for back-end databases than pre-built template querying  Issues Search results contradictory to human perception (in previous work) Technical challenges – Aggregating final score of an answer Relying on monotonicity of the rank aggregation function  Contributions New ranking function – Non-monotonic nature of ranking methods Techniques for avoiding unnecessary DB accesses – Skyline sweeping algorithm – Block pipeline algorithm 2

Copyright  2009 by CEBT Preliminaries  Keyword queries on a set of relations  Joined Tuple Tree (JTT) Tree of tuples – Top-k results Foreign key to primary key relationships Candidate Network (CN) Relevance score – How relevant the JTT is to the query  Example query : “maxtor netvista” 3 Top-3 JTTs c3 c3->p2 c1->p1 c2->p2 c2->p2<-c3

Copyright  2009 by CEBT Preliminaries: existing solutions (DISCOVER 2002,2003)  Enumerating (Union) all possible CNs C Q ->P Q : valid C Q ->U : not valid C Q ->U<-C Q : may be valid  Example (cont.) 4 rules Prune duplicate CNs Prune non-minimal CNs Prune CNs of type: R Q R Q DISCOVER (2003)

Copyright  2009 by CEBT Preliminaries: existing solutions (DISCOVER 2002,2003)  Upper bounding functions Bound the scores of potential answers from each CN – Stop query execution earlier – Ex) Sparse algorithm Global pipeline algorithm  Focus of this paper How to score a JTT : Ranking Function How to generate and order the SQL queries for the CNs : Top-k Join query – Minimal DB accesses are required before top-k results are returned. 5 idscore t150 t240 t330 t420 idscore I170 I260 I340 i420 aggregate

Copyright  2009 by CEBT Ranking Function  Problems with existing ranking functions Monotonic aggregation function have been considered. – SUM  Discordance with human perception Side Effect : Overly rewarding contributions of the same keyword in different tuples in the same JTT 6 C Q ->P Q

Copyright  2009 by CEBT Ranking Function  Modeling a JTT as a virtual document  attenuating : same keyword in different relations  Technical issues Expensive cost to compute  Completeness score and Size normalization score 7 C(t1) K2K2 K1K1 P(t1) C(t1) K2K2 K1K1 P(t1)

Copyright  2009 by CEBT Top-k Join algorithm  None of the existing top-k query processing methods deals with non-monotonic scoring function c[i]->p[i] max(score(p[1],c[i+1]), score(p[j+1], c[1]))  Monotonic, upper bounding function to the actual function Lemma 1. score(T,Q) can be bounded by a function uscore(T,Q)=1/(1-s) * min(A,B) max(uscore(c[i+1], p[1]), uscore(c[1],p[j+1])) 8 C(t1) K2K2 K1K1 P(t2) X

Copyright  2009 by CEBT Top-k Join algorithm  Skyline Sweeping Algorithm Avoid unnecessary join checking -> minimal number of accesses to the database dominate relationship among candidates – Checking candidate of higher upper bound first – Priority queue Descending order of the upper bound scores Technical point – Duplicate checking 9 uscore

Copyright  2009 by CEBT Top-k Join algorithm  Large gaps between the upper bound scores and the corresponding real scores Harder to stop early – upper bound of un-processed >> real score  Block Pipeline Algorithm Employing local non-monotonic upper bounding function that bounds the real score of JTTs more accurately Tighter upper bounding: bscore < uscore signature – An ordered sequence of term frequencies for all the query keywords – Signature of the block 10

Copyright  2009 by CEBT Experiments  Dataset: IMDB, DBLP and Mondial  Oracle 10g, MySQL , JDK 1.5  Implementation: Sparse, Global pipeline (GP). Skyline sweep (SS), Block pipeline (BP)  Metrics Number of top-1 answers (#Rel) Reciprocal rank (R-Rank)  Relevance answer It must match all the search keyword Its size must be the smallest 11

Copyright  2009 by CEBT Experiments  Effectiveness  Efficiency Observations – Fastest : BP – SS outperforms Sparse and GP – Sparse == GP (GP > Sparse for small k or easy query) – All algorithms are more responsive for smaller k values 12

Copyright  2009 by CEBT Experiments 13

Copyright  2009 by CEBT Conclusion  New ranking method Adapts that the state-of-the-art IR ranking function and principles  Query processing method Tailored for our non-monotonic ranking functions  Extensive experiments on large scale real databases High precision with high efficiency 14

Copyright  2009 by CEBT Reviews  Good Detailed explanation of background and existing approach Good paper organization and good examples  Short of rationale for new algorithms  Non-monotonicity of Block pipeline algorithm 15