Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA.

Slides:



Advertisements
Similar presentations
A General Algorithm for Subtree Similarity-Search The Hebrew University of Jerusalem ICDE 2014, Chicago, USA Sara Cohen, Nerya Or 1.
Advertisements

Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2.
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
Fast Algorithms For Hierarchical Range Histogram Constructions
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Containment of Nested XML Queries Xin (Luna) Dong, Alon Halevy, Igor Tatarinov University of Washington.
Genome-scale disk-based suffix tree indexing Benjarath Phoophakdee Mohammed J. Zaki Compiled by: Amit Mahajan Chaitra Venus.
Sandeep Tata, Richard A. Hankins, and Jignesh M. Patel Presented by Niketan Pansare, Megha Kokane.
2-dimensional indexing structure
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
Xyleme A Dynamic Warehouse for XML Data of the Web.
Design of Optimal Multiple Spaced Seeds for Homology Search Jinbo Xu School of Computer Science, University of Waterloo Joint work with D. Brown, M. Li.
CS Lecture 9 Storeing and Querying Large Web Graphs.
CS728 Lecture 16 Web indexes II. Last Time Indexes for answering text queries –given term produce all URLs containing –Compact representations for postings.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
2005rel-xml-iii1  View forests and query composition The composition algorithm works for a (large) subset of XQuery, excluding : (see paper for details)
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML Represented by: Ai Mu Based on the paper written by Ning Zhang, Varun.
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
1 Experimental Evidence on Partitioning in Parallel Data Warehouses Pedro Furtado Prof. at Univ. of Coimbra & Researcher at CISUC DEI/CISUC-Universidade.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Streaming XPath Engine Oleg Slezberg Amruta Joshi.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Efficient Processing of Updates in Dynamic XML Data Changqing Li, Tok Wang Ling, Min Hu.
A Semantic Caching Method Based on Linear Constraints Yoshiharu Ishikawa and Hiroyuki Kitagawa University of Tsukuba
1 Review of report "LSDX: A New Labeling Scheme for Dynamically Updating XML Data"
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Secure Data Outsourcing
Reuse or Never Reuse the Deleted Labels in XML Query Processing Based on Labeling Schemes Changqing Li, Tok Wang Ling, Min Hu.
1 Updates ADT 2010 ADT 2010 XQuery Updates in MonetDB/XQuery Stefan Manegold
RDF languages and storages part 2 - indexing semi-structure data Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia.
Multiway Search Trees Data may not fit into main memory
Efficient Filtering of XML Documents with XPath Expressions
RE-Tree: An Efficient Index Structure for Regular Expressions
Probabilistic Data Management
Spatial Online Sampling and Aggregation
External Memory Hashing
Efficient Processing of Top-k Spatial Preference Queries
Wei Wang University of New South Wales, Australia
Donghui Zhang, Tian Xia Northeastern University
Relax and Adapt: Computing Top-k Matches to XPath Queries
Efficient Aggregation over Objects with Extent
Presentation transcript:

Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA

Caching in Query Processing Buffer Cache: In-memory pool of disk pages. Semantic Cache:  Stores query results instead.  Can be in-memory or on disk.  Proposed in [Franklin et al., VLDB 96] for use in client- server systems.

Semantic XPath Cache A cached query result is a materialized view. View V answers query Q if there exists C s.t C º V = Q C, V & Q are all XPath here. Cache contains some views {V 1,…,V n }. Query Q is a hit if some V i answers it.

Motivation for the Semantic Cache C º V = Q : a cache hit results in a simpler query, on a much smaller XML fragment. Cache hits were processed two orders of magnitude faster than misses. Can also be maintained outside the database  Ex: application tier. Application tier caching for database-driven websites has become increasingly popular [Mohan et al., SIGMOD 02].

Example Illustrating C º V = Q V = Q = C = /b[x/y//z]/c

The Query/View Answerability Problem Given view V and query Q, does V answer Q, and if yes, then what should C be?

XPath Tree Patterns XPath queries have natural representation as tree patterns. E.g. If Q = The Query Axis is the path from the root node to the result node. The Query Depth is the number of axis nodes. Prefix(Q, k) is the query obtained by truncating Q at its k-th axis node.  E.g. Prefix(Q, 2) = a[v]/b Preds(Q, k) is the set of predicates of the k-th axis node of Q.  E.g. Preds(Q, 2) [x[.//y]]}

XPath Containment A С B if the result of A will always be a subset of the result of B. Checking XPath containment is coNP-complete [Miklau & Suciu, PODS 02]. A containment mapping maps nodes in B’s tree pattern to those in A’s, and establishes A С B.  It is a sound but incomplete condition.  It can be determined in polynomial time. Containment is different from answerability.  E.g. let A = /a[x]/b, B = /a/b

Criteria for Query/View Answerability “Rewriting XPath Queries using Materialized Views”, Xu and Ozsoyoglu, VLDB 2005

An Example of Answerability Checking  Let V =  Let Q =  Prefix(V, 2) = and Prefix(Q, 2) = The first condition is satisfied.  Preds(V, 2) = {[x//y]} and Preds(Q, 2) = {[x/y//z]}. The second condition is also satisfied.

Two Theoretical Results Theorem: If two tree patterns are minimal, and containment mappings exist both ways, then they are isomorphic. Theorem: If some view V answers query Q, then C can be set to the subtree of Q rooted at its k-th axis node, where k is the query depth of V.

The Cost of Answerability Checking Answerability checking involves tree operations:  Checking isomorphism between trees.  Looking for containment mappings between trees. Cache lookup by checking each view could give a high lookup overhead. Our Solution: Check answerability by string matching. This will be:  Cheaper  Amenable to indexing in a standard RDBMS.

Checking Tree Isomorphism We represent each tree pattern with its “normalized” query string. To obtain the normalized query:  At each axis node, do a DFS into each predicate subtree.  Before returning from a node, append its children node labels to its label, in lexicographic order.  Concatenate all axis node labels to get the normalized query.

Checking Predicate Containment For the tree of each predicate in Preds(Q, k), generate all trees that “containment map” to it. ConPreds(Q, k) is the set of normalized predicates obtained from these generated trees.

String-Based Criteria for Answerability The second condition is tweaked to correctly support comparison predicates.

Cache Organization Let V = Insert into XmlData (‘ ’). Record viewId. Insert into Prefix (‘/a[u]/b’). Record prefixId. Insert into View (viewId, prefixId, ’y/z=“str”|v’).

Cache Lookup Prefix(V, k) = P.prefix = Prefix(Q, k). V.pred Є Preds(V, k) and V.pred Є ConPreds(Q, k).

Cache Warm-up as View Selection Given the warm-up workload W, we generate another workload S. S is likely to have overlap with the test workload. Problem: Choose some m views from S which together answer a maximal subset of S. This is a variant of the set cover problem, which is NP-complete. We use a variant of the greedy approximation algorithm for set cover.

Experimental Setup The data instance was a 300 MB XML document created using the XMark generator. Expts run on a Pentium 4, 512 MB RAM machine. SQL Server 2005 beta used for storing the XML data as well as the cache. We compare our cache with a naïve cache which stores (query string, result XML) pairs.

Query Workloads Used We implemented our own XPath query generator for creating workloads for our expts. Query structure is generated randomly. Value for each predicate is obtained by sampling from its set of values taken, using Zipf distribution. Queries with depths 3, 4 and 5 were given 2, 2 and 3 predicates respectively.

Conclusions We defined a notion of query/view answerability for XPath. We showed how an efficient semantic cache based on these ideas can be employed. We demonstrated the scalability of cache lookup, and performance gains in query processing.