BLAS: An Efficient XPath Processing System Zhimin Song Advanced Database System Professor: Dr. Mengchi Liu.

Slides:

Advertisements

Similar presentations

Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.

Advertisements

Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,

Querying Workflow Provenance Susan B. Davidson University of Pennsylvania Joint work with Zhuowei Bao, Xiaocheng Huang and Tova Milo.

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.

Min LuTIMBER: A Native XML DB1 TIMBER: A Native XML Database Author: H.V. Jagadish, etc. Presenter: Min Lu Date: Apr 5, 2005.

Fast Algorithms For Hierarchical Range Histogram Constructions

TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.

Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

QUANZHONG LI BONGKI MOON Indexing & Querying XML Data for../Regular Path Expressions/* SUNDAR SUPRIYA.

BLAS: An Efficient XPath Processing System Chen Y., Davidson S., Zheng Y. Νίκος Λούτας.

Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.

Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.

Implementation of Graph Decomposition and Recursive Closures Graph Decomposition and Recursive Closures was published in 2003 by Professor Chen. The project.

Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.

Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.

1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.

Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.

Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.

Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.

Selectivity Estimation of XPath for Cyclic Graphs Yun Peng.

BLAS : An Efficient XPATH Processing System Presented by: Moran Birenbaum Published by: Yi Chen Susan B. Davidson Yifeng Zheng.

CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:

1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:

XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,

NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison.

XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,

A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.

Querying Structured Text in an XML Database By Xuemei Luo.

Advanced Databases: Lecture 6 Query Optimization (I) 1 Introduction to query processing + Implementing Relational Algebra Advanced Databases By Dr. Akhtar.

Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,

Angela Bonifati, “Active XQuery”, ICDE Active XQuery A. Bonifati, D. Braga, A. Campi, S. Ceri Politecnico di Milano (Italy)

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.

Database Systems Part VII: XML Querying Software School of Hunan University

Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on: Global Computing (GC) Proactive.

BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.

Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.

INRIA - Progress report DBGlobe meeting - Athens November 29 th, 2002.

Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.

Session 1 Module 1: Introduction to Data Integrity

Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty

Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree

APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

An Effective SPARQL Support over Relational Database Jing Lu, Feng Cao, Li Ma, Yong Yu, Yue Pan SWDB-ODBIS 2007 SNU IDB Lab. Hyewon Lim July 30 th, 2009.

From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.

Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.

1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.

Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.

1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.

Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia.

BY: Mark Gruszecki.  What is a Recursive Query?  Definition(s) and Algorithm(s)  Optimization Techniques  Practical Issues  Impact of each Optimization.

Chapter 13: Query Processing

Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.

XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,

Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.

Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.

By A. Aboulnaga, A. R. Alameldeen and J. F. Naughton Vldb’01

Compressing XML Documents with Finite State Automata

Indexing Structures for Files and Physical Database Design

OrientX: an Integrated, Schema-Based Native XML Database System

(b) Tree representation

Indexing and Hashing Basic Concepts Ordered Indices

Early Profile Pruning on XML-aware Publish-Subscribe Systems

Presentation transcript:

BLAS: An Efficient XPath Processing System Zhimin Song Advanced Database System Professor: Dr. Mengchi Liu

BLAS: An Efficient XPath Processing System 2 Outline n Introduction n BLAS System n Experimental Results n Conclusions

BLAS: An Efficient XPath Processing System 3 n n cytochrome c [validated] n n cytochrome c n … n n Evans, M.J. … n n 2001 n The human somatic cytochrome c gene … n … n n Figure 1 : Sample XML protein repository

BLAS: An Efficient XPath Processing System 4 u XML has complex, tree-like structure(nodes). u Languages for Querying XML are based on path navigation(XPath [1]). Given node  Child node(Child axis) Given node  Descendant node(Descendant axis) Introduction

BLAS: An Efficient XPath Processing System 5 Introduction(cont..) u Some techniques were already proposed in order to improve XPath Processing. For example, D- labeling which is used to efficiently handle descendant axis traversal. u What about complex queries including child axis, branch??? u In this case P-labeling is proposed in this paper. It optimizes an important class of queries called suffix path queries.

BLAS: An Efficient XPath Processing System 6 BLAS(Bi-LAbeling based System) u Basic definitions u The labeling scheme(Index generator) u Query translator

BLAS: An Efficient XPath Processing System 7 u Basic definitions: F BLAS: a system for efficiently process complex queries based D-labeling and P-labeling. F The BLAS deals with a subset of XPath queires consisting of: Child axis navigation ( / ) Descendant axis navigation ( // ) Branches ( […..] ) F The evaluation of a path expression P( [P] ) returns the set of nodes in an XML tree T which are reachable by P starting from the root of T. F Since P can be evaluated to retrieve a set of XML nodes, we use “Path expression” and “query” interchangeably. F P Q if and only if [P] [Q]. F P Q = if and only if [P] [Q] =

BLAS: An Efficient XPath Processing System 8 u Basic definitions(cont..): F Suffix path expression: a path expression P which optionally begins with a descendant axis step(//), followed by zero or more child axis steps (/). Example: //protein/name Another one : /proteinDatabase/proteinEntry/protein/name F SP(n) : the unique simple path P from the root to the node n. F So evaluating a suffix path expression Q is to find all the nodes n such that SP(n) Q.

BLAS: An Efficient XPath Processing System 9 Architecture of BLAS Query Engine Query decomposition Subquery Generator (based on P-labeling) XPath Query Suffix Path Query … Subquery composition (based on D-labeling) Query Translator Ancestor-descendant relationship between the results of the suffix path queries Query XML P-labelings D-labelings Data values SAX Parser Events P-labeling generator D-labeling generator … Storage Data loader Query result

BLAS: An Efficient XPath Processing System 10 u The labeling scheme(Index generator) F D-labeling scheme: triplet for a XML node n(n.d1 <= n.d2) and m(m.d1<=m.d2). m is a descendant of n if and only if n.d1 m.d2. m is a child of n if and only if m is a descendant of n and n.d3+1=m.d3. Let d1 and d2 for a node n be the position of the start tag and end tag. d3 is set to be the level of n in the XML tree which is the length of the path from the root to n.  D-label will be represented as

BLAS: An Efficient XPath Processing System 11 n Example: using D-labeling n proteinDatabase proteinEntry superfamily year “2001” reference protein refinfo // “cytochrome c” author “Evans, M.J.” // Query: //proteinDatabase//refinfo Let pDB and refinfo be two relations which store these nodes, then D-join them Select pDB.start,pDB.end,refinfo.start,refinfo.end From pDB, refinfo Where pDB.start refinfo.end Title First retrieve all the nodes reachable by refinfo and by proteinDatabase

BLAS: An Efficient XPath Processing System 12 F P-labeling Scheme It is also important to implement child axis navigation efficiently. e.g. /proteinDatabase/proteinEntry/protein/name Target: improve “/” evaluation Focus on suffix path queries: e.g. //protein/name

BLAS: An Efficient XPath Processing System 13 Assign each node a number, and each suffix path an interval such that: For any two suffix paths Q 1 and Q 2, Q 1 is contained in Q 2 if Q 1.p1 = Q 2.p2 A node n is contained in the suffix path Q if Q.p1<= SP(n).p1 <=Q.p2. Let Q be a suffix path query. Then [Q] = {n | Q.p1 <= n.plabel<=Q.p2} when n.plabel=SP(n).p1

BLAS: An Efficient XPath Processing System 14 F P-labeling Construction(algorithm) Suppose that there are n distinct tags (t1,t2,….,tn). Assign “/” a ratio r0 and each tag ti a ratio ri such that r0+r1+r2+…….+ri = 1. Let ri = 1/(n+1). Define the domain of the numbers in a P-label to be integers in [0, m-1], here m is chosen such that m>=, where h is the longest path in an XML tree. Algorithms as follows: –Path // is assigned an interval(P-label) of. –Partition the interval in tag order proportional to ti’s ratio ri, for each path //ti and child axis navigation’s ratio r0. –This means we allocate the interval to “/” and to each ti such that (pi+1 - pi)/m=ri and p1/m = r0

BLAS: An Efficient XPath Processing System 15 F P-labeling Construction(Example) / //protein Database //protein Entry10 2* *10 10 //protein 4* //name 5* * *10 10 /protein/name * * *10 10 /name //proteinEntry/ name //protein/name 4.01* * * //proteinDat abase/name 4.04*10 10 Query: //protein/name M= tags Ri=0.01

BLAS: An Efficient XPath Processing System 16 u Query translator:translates an input XPath query into standard SQL. F Query decomposition Splits the query in to a set of suffix path queries and records the ancestor-descendant relationship. F SQL generation Computes the query’s p-labeling and generates a corresponding subquery in SQL. F SQL composition The subqueries are combined into a single SQL query based on D-labeling and the ancestor-descendant relationship.

BLAS: An Efficient XPath Processing System 17 F Split algorithm: D-elimination(query tree Q) reference protein refinfo// “cytochrome c” author “Evans, M.J.” // Title Q2 Q1 Q3 Depth-first traversal Split p//q into p and //q superfamily Invokes the B-elimination if branches in Q. Otherwise, it evaluates Q using P-labels. Join intermediate results by their D-labels proteinDatabase proteinEntry “2001” year P//q  p and //q

BLAS: An Efficient XPath Processing System 18 B-elimination(query tree Q1) reference protein refinfo Title Q1 proteinDatabase proteinEntry “2001” year protein reference refinfo Title proteinDatabase proteinEntry “2001” year // Q4 Q6 Q5 P[q1,q2….qi]/r  p, //q1, //q2,…..,//qi, //r

BLAS: An Efficient XPath Processing System 19 protein reference refinfo Title proteinDatabase proteinEntry “2001” year // Q4 Q7 Q5 Q8 Q9 // B-elimination(cont..):

BLAS: An Efficient XPath Processing System 20 F Push up algorithm: optimize the branch elimination (B-elimination). protein proteinDatabase proteinEntry Q4 Q5 proteinDatabase proteinEntry reference proteinDatabase proteinEntry refinfo reference proteinDatabase proteinEntry refinfo title reference proteinDatabase proteinEntry refinfo year “2001” Then split P[q1,q2,….,qi]/r  p, p/q1, p/q2, …..p/qi, p/r Since p/qi and p/r are more specific than //qi and //r,

BLAS: An Efficient XPath Processing System 21 F Unfold algorithm:A further optimization of descendant-axis elimination(D-elimination). There is example as follows: Q2=/ProteinDatabase/ProteinEntry/protein//superfamily=“cytochro me c” Q21 = /ProteinDatabase/ProteinEntry/protein/classification/ superfamily=“cytochrome c”, P//q  p/r1/q, p/r2/q, ….., p/ri/q

BLAS: An Efficient XPath Processing System 22 n Data sets n Query sets F Suffix path queries F Path queries F XPath queries n Query Engine: RDBMS or File System Experimental Results

BLAS: An Efficient XPath Processing System 23 Query Execution Time A:Auction P: Protein S: Shakespeare 1: suffix path query 2: path query 3: XPath query Query time for Shakespeare, Protein and Auction data sets

BLAS: An Efficient XPath Processing System 24 Scalability The performance of D-labeling, Split and Push up for the suffix path query

BLAS: An Efficient XPath Processing System 25 Conclusion n P-labeling scheme is proposed to evaluate suffix path queries efficiently. n BLAS combines P-labeling and D-labeling to evaluate XPath queries. n BLAS is more efficient because the queries translated from XPath queries require: u fewer disk accesses u fewer joins n Experiments show the effectiveness of BLAS

BLAS: An Efficient XPath Processing System 26 n [1]J. Clark and S. DeRose. XML Path language (XPath), November n [13] D. DeHaan, D. Toman, M. Consens, and M. T. Ozsu. A comprehensive XQuery to SQL translation using dynamic interval encoding. In Proceedings of SIGMOD, n [26] J.-K. Min, M.-J. Park, and C.-W. Chung. XPRESS: A queriable compression for XML data. In Proceedings of SIGMOD, 2003.

BLAS: An Efficient XPath Processing System 27 Thank you! Question ?