Keyword Proximity Search on XML Graphs Vagelis Hristidis Yannis Papakonstatinou Andrey Presenter: Feng Shao.

Slides:



Advertisements
Similar presentations
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Advertisements

Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Xyleme A Dynamic Warehouse for XML Data of the Web.
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
Circumventing Data Quality Problems Using Multiple Join Paths Yannis Kotidis, Athens University of Economics and Business Amélie Marian, Rutgers University.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
Querying Structured Text in an XML Database By Xuemei Luo.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Dimitrios Skoutas Alkis Simitsis
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, Keyword Search on Relational Data Streams Alexander Markowetz Yin.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Advisor: Koh Jia-Ling Nonhlanhla Shongwe EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Ten Thousand SQLs Kalmesh Nyamagoudar 2010MCS3494.
1 Using Network Coding for Dependent Data Broadcasting in a Mobile Environment Chung-Hua Chu, De-Nian Yang and Ming-Syan Chen IEEE GLOBECOM 2007 Reporter.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Keyword Searching and Browsing in Databases using BANKS
Efficient Subgraph Similarity All-Matching
MCN: A New Semantics Towards Effective XML Keyword Search
Prefer: A System for the Efficient Execution
Introduction to XML IR XML Group.
Presentation transcript:

Keyword Proximity Search on XML Graphs Vagelis Hristidis Yannis Papakonstatinou Andrey Presenter: Feng Shao

Outline l Introduction l Proximity Keyword Query Semantics l Architecture l XML Decompositions l Execution l Experiment l Conclusion

Introduction l Keyword search is easy-to-use l No need to know the structure and query language l XML: labeled graph, representing semistructured self-describing data. l Feb.10, 5 th birthday of XML From

Problem--Keyword proximity query l Input: a set of keywords l Results: trees of XML fragments(called target objects) that contains all the keywords, ranked according to their size l Assume the existence of schema, facilitates the presentation of the results and used in optimizing the performance of the system.

Name[John]  person  supplier  lineitem  linepart  product  descr[set of VCR and DVD], size 6 Name[John]  person  supplier  lineitem  linepart  part  subpart  part  name[VCR], size 8

Challenges l Presentation of result graphs: l Semantically meaningful l Avoid a huge number of trivial results

Challenges l Presentation of result graphs: l Semantically meaningful l Avoid a huge number of trivial results l Providing fast response time l Efficient storage of data l On-demand execution, guided according to user’s navigation

Outline l Introduction l Proximity Keyword Query Semantics l Architecture l XML Decompositions l Execution l Experiment l Conclusion

Semantics l XML Graph: a labeled graph l Node v: id(v), label λ(v),value val(v) l Edge: containment and reference edges l Schema graph: a directed graph Node v s : labelλ(v s ), content type type(v s ) (all or choice) l Edge e s : containment or refrence, annotated with a maximum occurrence occ(e s ) l A XML graph conforms to a schema graph

schema graph XML Graph

Query semantics l Result: the set of all possible Minimal Total Target Object Networks(MTTON’s) l What’s MTTON? l Node network j: an uncycled subgraph of G, such that each edge in j is an edge in G l Total node network j of keyword {k1,…,km}: a node network where every keyword is contained at least one node n of j l Minimal Total Node Network(MTTN): a total node network j where no node can be removed and j still be a total node network. Score : number of edges l Target object of node n: a segment of XML graph, large enough to be meaningful and semantically identify the node n, and as small as possible.

MTTON(cont.) l Given a MTNN j with nodes v1,..., vn there is a corresponding MTTON t, which is a tree whose l nodes is a minimal set of target objects {t1,..., tm} such that for every node nk ∈ j there is a tl ∈ t such that target(nk) = tl. l There is an edge from a target object ti to a target object tj if there is an edge ( or a path) from a node that belongs to ti to a node that belongs to tj. l The score of a MTTON j is the score of its corresponding MTNN. MTNN: name MTNN:name  person  nation

MTTN & MTTON Name[John]  person  supplier  lineitem  linepart  part  subpart  part  name[VCR]

Target object l Defined from an administrator using the Target Schema Segment (TSS) graph l TSS graph: a partial mapping of nodes in G A node t S is created in G TSS for each set S = {s1,..., sw} of nodes of G that are mapped to t S. An edge (t S, t S’ ) is created in G TSS if the schema graph has nodes s ∈ S and s ‘ ∈ S’, that are connected directly through an edge (s,s’) or indirectly through a path of dummy schema nodes. l Target decomposition: given the TSS graph, decompose XML graph into target objects, connected to each other

Example

MTTN & MTTON Name[John]  person  supplier  lineitem  linepart  part  subpart  part  name[VCR]

Presentation Graph l Naïve method: multiple threads, evaluating various plans for producing MTTON’s, and outputs as they come. l Pro: fast response time l Con: many trivial results l Interactive interface: allows navigation and hides the trivial results

Presentation Graph

Outline l Introduction l Proximity Keyword Query Semantics l Architecture l XML Decompositions l Execution l Experiment l Conclusion

Architecture

Load Stage Keyword: The number of nodes of each type and etc. Given an object id instantly return the whole target object A decomposition of the TSS graph into fragments, which correspond to connection relations that allow efficient retrieval of MTTON’s.

Example of decomposition

Query processing Keyword: Keyword: TV, VCR

Execution Plan Schema graph Connection relationsTSS graph Candidate Network Candidate TSS Network Execution Plan Schema graph and TSS graph Connection relations schema

Outline l Introduction l Proximity Keyword Query Semantics l Architecture l XML Decompositions l Execution l Experiment l Conclusion

XML Decomposition l Decompose TSS graph into fragments l Determines how the connections are stored in the database l Dramatically change the performance l Example : aa

Decomposition Tradeoff l # fragments v.s. performance l Minimal decomposition l A fragment is built for each edge of TSS graph l Candidate TSS network C of size S, requires S-1 joins l Maximal decomposition l A fragment F is built for every possible candidate TSS network C l C requires zero joins. l Not feasible in practice

Tradeoff (cont.) l Clustering and indexing are critical l Maximal decomp.: multi-attribute indices l Non-maximal decomp.: a connection relation R is clustered on the direction that R is used l Example l Classify TSS graph, based on the storage redundancy in the corresponding connection relations. l 4NF, inlined( non-MVD,no-4NF) l Decomposition Algorithm l See paper

Outline l Introduction l Proximity Keyword Query Semantics l Architecture l XML Decompositions l Execution l Experiment l Conclusion

Execution l Goal: fast response time l Web search engine-like presentation l Use inlined decomposition l Use thread pool l Use nest-loop joins l Example: Outmost loop: over TSS part VCR,name l Optimization: store partial results

Execution l Presentation graphs(on-demand) l Initially, Xkeyword decomposition is used to retrieve the top result of each CN. l Then use a combination of decompositions to find the minimal connection of the expanded nodes.

Outline l Introduction l Architecture l Proximity Keyword Query Semantics l XML Decompositions l Execution l Experiment l Conclusion

Experiments l Measure various decompositions, for top-K and full results l Evaluate the performance of algorithm for search engine-like presentation method and on- demand expansion method l Data: DBLP XML database, 2 keywords Maximum size of CTSSN: M = 6 Max size of fragments: L = 2

Decompositions

Execution algorithm Speedup = optimized algorithm / naïve, non-caching algorithm

Execution algorithm Keyword queries: the names of two authors, k1 and k2 Candidate Network: Author k1  Paper  Author k2 Time measured: average time to expand a Paper node

Outline l Introduction l Architecture l Proximity Keyword Query Semantics l XML Decompositions l Execution l Experiment l Conclusion

Conclusion l Xkeyword is built on a relational database and, hence, can accommodate very large graphs. l Present keyword proximity search semantics, extended to capture the novel result presentation method. l Present an architecture allowing for choosing which connections will be precomputed l Address on-demand performance requirement l Demo: