Covering Indexes for Branching Path Queries Raghav Kaushik, Philip Bohannon, Jeffrey F Naughton and Henry F Korth 1Abdullah Mueen.

Slides:



Advertisements
Similar presentations
Covering Indexes for XML Queries by Prakash Ramanan
Advertisements

DS.GR.14 Graph Matching Input: 2 digraphs G1 = (V1,E1), G2 = (V2,E2) Questions to ask: 1.Are G1 and G2 isomorphic? 2.Is G1 isomorphic to a subgraph of.
Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2.
By Daniela Floresu Donald Kossmann
DIJKSTRA’s Algorithm. Definition fwd search Find the shortest paths from a given SOURCE node to ALL other nodes, by developing the paths in order of increasing.
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA Andrea Pugliese.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Edited by Malak Abdullah Jordan University of Science and Technology Data Structures Using C++ 2E Chapter 12 Graphs.
1 Representing Graphs. 2 Adjacency Matrix Suppose we have a graph G with n nodes. The adjacency matrix is the n x n matrix A=[a ij ] with: a ij = 1 if.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes 1Abdullah Mueen.
Optimizing queries using materialized views J. Goldstein, P.-A. Larson SIGMOD 2001.
The Shortest Path Problem
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
On the Use of Regular Expressions for Searching Text Charles L.A. Clarke and Gordon V. Cormack Fast Text Searching.
Computer vision: models, learning and inference
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Selective and Authentic Third-Party distribution of XML Documents - Yashaswini Harsha Kumar - Netaji Mandava (Oct 16 th 2006)
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
Querying Structured Text in an XML Database By Xuemei Luo.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
1 Introduction to trees Instructor: Dimitrios Kosmopoulos.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Database Systems Part VII: XML Querying Software School of Hunan University
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
Stevenson and Ozgur First Edition Introduction to Management Science with Spreadsheets McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
Semistructured Data Extensible Markup Language Document Type Definitions Zaki Malik November 04, 2008.
KAIST2002 SIGDB Tutorial1 Indexing Methods for Efficient XML Query Processing Jun-Ki Min KAIST
JSTL The JavaServer Pages Standard Tag Library (JSTL) is a collection of useful JSP tags which encapsulates core functionality common to many JSP applications.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
Dale Roberts Department of Computer and Information Science, School of Science, IUPUI CSCI 240 Recursion and Trees Dale Roberts, Lecturer
Compact Encodings for All Local Path Information in Web Taxonomies with Application to WordNet Svetlana Strunjaš-Yoshikawa Joint with Fred Annexstein and.
M180: Data Structures & Algorithms in Java Trees & Binary Trees Arab Open University 1.
Chapter 8 Maximum Flows: Additional Topics All-Pairs Minimum Value Cut Problem  Given an undirected network G, find minimum value cut for all.
SF-Tree: An Efficient and Flexible Structure for Estimating Selectivity of Simple Path Expressions with Accuracy Guarantee Ho Wai Shing.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
Dr. N. MamoulisAdvanced Database Technologies1 Topic 8: Semi-structured Data In various application domains, the data are semi-structured; the database.
1 Review of report "LSDX: A New Labeling Scheme for Dynamically Updating XML Data"
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Chapter 2 1. Chapter Summary Sets The Language of Sets - Sec 2.1 – Lecture 8 Set Operations and Set Identities - Sec 2.2 – Lecture 9 Functions and sequences.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.
Trie Indexes for Efficient XML Query Processing
By A. Aboulnaga, A. R. Alameldeen and J. F. Naughton Vldb’01
School of Computing Clemson University Fall, 2012
RE-Tree: An Efficient Index Structure for Regular Expressions
Temporal Indexing MVBT.
Structure and Value Synopses for XML Data Graphs
Lectures on Network Flows
Data Structures: Segment Trees, Fenwick Trees
On Spatial Joins in MapReduce
XML Data Introduction, Well-formed XML.
Distance and Routing Labeling Schemes in Graphs
Lectures on Graph Algorithms: searching, testing and sorting
XML Query Processing Yaw-Huei Chen
XML indexing – A(k) indices
Incremental Maintenance of XML Structural Indexes
Networks Kruskal’s Algorithm
Indexing Methods for Efficient XML Query Processing
Switching Lemmas and Proof Complexity
Wei Wang University of New South Wales, Australia
Presentation transcript:

Covering Indexes for Branching Path Queries Raghav Kaushik, Philip Bohannon, Jeffrey F Naughton and Henry F Korth 1Abdullah Mueen

XML as Graph Data Abdullah Mueen2 oid label(3) Non-tree edges: model IDREF relationships in the document Leaf nodes with attributes are suppressed

Branching Path Expression Abdullah Mueen3 ROOT/metro/neighborhoods/neighborhood [/business=>cinema-hall]/cultural=>museum

Example (1) Abdullah Mueen4 //hotel[/star][ museum[\art]]]

Covering Index A covering index can answer any query from a set of queries without consulting with the original document. The GOAL of this paper is to find a covering index for “Branching Path Queries”. Abdullah Mueen5

k-bisimilarity Abdullah Mueen R D C B A D D C C B Two nodes u and v are called k-bisimilar (u ≈ k v) if 1.label(u) = label(v) 2.every incoming label path of length≤k to u matches with at least one incoming path of length≤k to v and vice versa. 2,4 are 0-bisimilar. 5,7 are 1-bisimilar 8,9 are 2-bisimilar 6,8 are 1-bisimilar  ≈ k defines an equivalence class over the set of nodes in G  The algorithm for computing k- bisimulation will be shown later

1-index : Covering Index for Simple Path Expression Abdullah Mueen R D C B A D D C C B R D C B A D C B {8,9} {1}{1} {2}{2} {4}{4} {5}{5} {3}{3} {6}{6} 1919 C {7}{7} R D C B A D C B {8}{8} {1}{1} {2}{2} {4}{4} {5}{5} {3}{3} {6}{6} 1919 C {7}{7} 1818 D {9}{9} R C B A D {1}{1} {2,4} {3,5,7} {6,8,9} A(0)A(1) A(2)A(3) = 1-index data graph G R C B A D C B {1}{1} {2}{2} {4}{4} {5,7} {3}{3} {6,8,9} SuccStable

Inverse edges Abdullah Mueen R D C B A D DC C B R D C B A D DC C B 5,7 are not 1-bisimilar 5,7 are 1-bisimilar

The F&B index Abdullah Mueen9 While there is no change – Reverse all edges – Compute Forward Bismilarity Partition – Reverse all edges again. – Compute Backward Bisimilarity Partition

Forward Bisimulation Abdullah Mueen R D C B A D D C C B R D C B A D D C C B R D C B A D D C C B R D C B A D D C C B

Backword Bisimulation Abdullah Mueen R D C B A D DC C B R D C B A D DC C B R D C B A D DC C B

Properties of F&B index The F&B index over a data graph G covers all branching path expression. F&B index is the smallest of the indexes that covers branching path queries. Generally F&B is large for most of the real documents. Abdullah Mueen12

1. Tags to be indexed There are tags that are not used for Queries. bold, emph We specify set of tags to be indexed. In a 100MB document, the F&B index on all tags has 436,000 nodes while ignoring formatting tags it has 18,000 nodes. Abdullah Mueen13

2. IDREF edges to be indexed IDREF edges are not counted in // operation. IDREF edges are explicitly described in the path expression by => operator. We specify the Set of IDREF edges to be indexed. The 100MB document has 1.3 million nodes with all IDREF edges while it has 18,000 nodes without any IDREF edges and formatting tags. Abdullah Mueen14

3. Exploiting Local Similarity Long Queries are not frequent and interesting. If we restrict the length of the possible queries, we can get much smaller index tree than the F&B index. We specify the length of the local path by using k-bisimilarity instead of bisimilarity while computing the F&B index. Abdullah Mueen15

4. Restricting Tree Depth Long nested conditions are less likely to occur. We specify the maximum depth of the conditional path expression by tree-depth (defined next). Abdullah Mueen16

tree depth Abdullah Mueen17 //museums/history/museum[/featured and museum[\art]]]

Definition of an Index A set of tags T Set of IDREF edges on both directions ref fwd and ref bwd Two parameters k bwd and k fwd to restrict the length of the path queries One parameter td to restrict the depth of the nested conditional expression. Abdullah Mueen18

The BPCI index Abdullah Mueen19 Remove all tags not in T such that the removal does not cut out a tag in T. Start with label grouping as current partition P For i=0 and i≤td – Reverse all edges in G, retain IDREF edges only in ref fwd. – P ← Forward k fwd -Bismilar Partition of P and inc(i) – Reverse all edges in G again, retain IDREF edges only in ref bwd. – P ← Backward k bwd -Bisimilar Partition of P and inc(i)

Variations of BPCI Abdullah Mueen20

Testing if an Index covers a Query Build the Query graph Check if all tags and IDREF edges in the query are in T and in (ref bwd U ref fwd ) Check if the tree depth of the query is less than td of the index Check if all paths in the query with even tree depth have length < k bwd Check if all paths in the query with odd tree depth have length < k fwd Abdullah Mueen21

Result on Xmark benchmark Abdullah Mueen22 1.I all is the F&B index 2.I allmost-all is F&B with k fwd = 1 3.I specific is built on the query

Result Abdullah Mueen23

Conlclusion BPCI is the covering index for Branching Path Queries. By setting appropriate parameters, we can get a wide range of queries suitable for various applications. Extensions – Updating and Bulk loading – Integration with value indexes Abdullah Mueen24