BLAS: An Efficient XPath Processing System Zhimin Song Advanced Database System Professor: Dr. Mengchi Liu.

BLAS: An Efficient XPath Processing System Zhimin Song Advanced Database System Professor: Dr. Mengchi Liu

BLAS: An Efficient XPath Processing System 2 Outline n Introduction n BLAS System n Experimental Results n Conclusions

BLAS: An Efficient XPath Processing System 3 n n cytochrome c [validated] n n cytochrome c n … n n Evans, M.J. … n n 2001 n The human somatic cytochrome c gene … n … n n Figure 1 : Sample XML protein repository

BLAS: An Efficient XPath Processing System 4 u XML has complex, tree-like structure(nodes). u Languages for Querying XML are based on path navigation(XPath [1]). Given node  Child node(Child axis) Given node  Descendant node(Descendant axis) Introduction

BLAS: An Efficient XPath Processing System 5 Introduction(cont..) u Some techniques were already proposed in order to improve XPath Processing. For example, D- labeling which is used to efficiently handle descendant axis traversal. u What about complex queries including child axis, branch??? u In this case P-labeling is proposed in this paper. It optimizes an important class of queries called suffix path queries.

BLAS: An Efficient XPath Processing System 6 BLAS(Bi-LAbeling based System) u Basic definitions u The labeling scheme(Index generator) u Query translator

BLAS: An Efficient XPath Processing System 7 u Basic definitions: F BLAS: a system for efficiently process complex queries based D-labeling and P-labeling. F The BLAS deals with a subset of XPath queires consisting of: Child axis navigation ( / ) Descendant axis navigation ( // ) Branches ( […..] ) F The evaluation of a path expression P( [P] ) returns the set of nodes in an XML tree T which are reachable by P starting from the root of T. F Since P can be evaluated to retrieve a set of XML nodes, we use “Path expression” and “query” interchangeably. F P Q if and only if [P] [Q]. F P Q = if and only if [P] [Q] =

BLAS: An Efficient XPath Processing System 8 u Basic definitions(cont..): F Suffix path expression: a path expression P which optionally begins with a descendant axis step(//), followed by zero or more child axis steps (/). Example: //protein/name Another one : /proteinDatabase/proteinEntry/protein/name F SP(n) : the unique simple path P from the root to the node n. F So evaluating a suffix path expression Q is to find all the nodes n such that SP(n) Q.

BLAS: An Efficient XPath Processing System 9 Architecture of BLAS Query Engine Query decomposition Subquery Generator (based on P-labeling) XPath Query Suffix Path Query … Subquery composition (based on D-labeling) Query Translator Ancestor-descendant relationship between the results of the suffix path queries Query XML P-labelings D-labelings Data values SAX Parser Events P-labeling generator D-labeling generator … Storage Data loader Query result

BLAS: An Efficient XPath Processing System 10 u The labeling scheme(Index generator) F D-labeling scheme: triplet for a XML node n(n.d1 <= n.d2) and m(m.d1<=m.d2). m is a descendant of n if and only if n.d1 m.d2. m is a child of n if and only if m is a descendant of n and n.d3+1=m.d3. Let d1 and d2 for a node n be the position of the start tag and end tag. d3 is set to be the level of n in the XML tree which is the length of the path from the root to n.  D-label will be represented as

BLAS: An Efficient XPath Processing System 11 n Example: using D-labeling n proteinDatabase proteinEntry superfamily year “2001” reference protein refinfo // “cytochrome c” author “Evans, M.J.” // Query: //proteinDatabase//refinfo Let pDB and refinfo be two relations which store these nodes, then D-join them Select pDB.start,pDB.end,refinfo.start,refinfo.end From pDB, refinfo Where pDB.start refinfo.end Title First retrieve all the nodes reachable by refinfo and by proteinDatabase

BLAS: An Efficient XPath Processing System 12 F P-labeling Scheme It is also important to implement child axis navigation efficiently. e.g. /proteinDatabase/proteinEntry/protein/name Target: improve “/” evaluation Focus on suffix path queries: e.g. //protein/name

BLAS: An Efficient XPath Processing System 13 Assign each node a number, and each suffix path an interval such that: For any two suffix paths Q 1 and Q 2, Q 1 is contained in Q 2 if Q 1.p1 = Q 2.p2 A node n is contained in the suffix path Q if Q.p1<= SP(n).p1 <=Q.p2. Let Q be a suffix path query. Then [Q] = {n | Q.p1 <= n.plabel<=Q.p2} when n.plabel=SP(n).p1

BLAS: An Efficient XPath Processing System 14 F P-labeling Construction(algorithm) Suppose that there are n distinct tags (t1,t2,….,tn). Assign “/” a ratio r0 and each tag ti a ratio ri such that r0+r1+r2+…….+ri = 1. Let ri = 1/(n+1). Define the domain of the numbers in a P-label to be integers in [0, m-1], here m is chosen such that m>=, where h is the longest path in an XML tree. Algorithms as follows: –Path // is assigned an interval(P-label) of. –Partition the interval in tag order proportional to ti’s ratio ri, for each path //ti and child axis navigation’s ratio r0. –This means we allocate the interval to “/” and to each ti such that (pi+1 - pi)/m=ri and p1/m = r0

BLAS: An Efficient XPath Processing System 15 F P-labeling Construction(Example) 0 10 12 / //protein Database //protein Entry10 2*10 10 3*10 10 //protein 4*10 10... //name 5*10 10 4.04*10 10 4.03*10 10 /protein/name 4.0301*10 10... 4*10 10 5*10 10 /name //proteinEntry/ name //protein/name 4.01*10 10 4.02*10 10 4.03*10 10... //proteinDat abase/name 4.04*10 10 Query: //protein/name M= 10 12 99 tags Ri=0.01

BLAS: An Efficient XPath Processing System 16 u Query translator:translates an input XPath query into standard SQL. F Query decomposition Splits the query in to a set of suffix path queries and records the ancestor-descendant relationship. F SQL generation Computes the query’s p-labeling and generates a corresponding subquery in SQL. F SQL composition The subqueries are combined into a single SQL query based on D-labeling and the ancestor-descendant relationship.

BLAS: An Efficient XPath Processing System 17 F Split algorithm: D-elimination(query tree Q) reference protein refinfo// “cytochrome c” author “Evans, M.J.” // Title Q2 Q1 Q3 Depth-first traversal Split p//q into p and //q superfamily Invokes the B-elimination if branches in Q. Otherwise, it evaluates Q using P-labels. Join intermediate results by their D-labels proteinDatabase proteinEntry “2001” year P//q  p and //q

BLAS: An Efficient XPath Processing System 18 B-elimination(query tree Q1) reference protein refinfo Title Q1 proteinDatabase proteinEntry “2001” year protein reference refinfo Title proteinDatabase proteinEntry “2001” year // Q4 Q6 Q5 P[q1,q2….qi]/r  p, //q1, //q2,…..,//qi, //r

BLAS: An Efficient XPath Processing System 19 protein reference refinfo Title proteinDatabase proteinEntry “2001” year // Q4 Q7 Q5 Q8 Q9 // B-elimination(cont..):

BLAS: An Efficient XPath Processing System 20 F Push up algorithm: optimize the branch elimination (B-elimination). protein proteinDatabase proteinEntry Q4 Q5 proteinDatabase proteinEntry reference proteinDatabase proteinEntry refinfo reference proteinDatabase proteinEntry refinfo title reference proteinDatabase proteinEntry refinfo year “2001” Then split P[q1,q2,….,qi]/r  p, p/q1, p/q2, …..p/qi, p/r Since p/qi and p/r are more specific than //qi and //r,

BLAS: An Efficient XPath Processing System 21 F Unfold algorithm:A further optimization of descendant-axis elimination(D-elimination). There is example as follows: Q2=/ProteinDatabase/ProteinEntry/protein//superfamily=“cytochro me c” Q21 = /ProteinDatabase/ProteinEntry/protein/classification/ superfamily=“cytochrome c”, P//q  p/r1/q, p/r2/q, ….., p/ri/q

BLAS: An Efficient XPath Processing System 22 n Data sets n Query sets F Suffix path queries F Path queries F XPath queries n Query Engine: RDBMS or File System Experimental Results

BLAS: An Efficient XPath Processing System 23 Query Execution Time A:Auction P: Protein S: Shakespeare 1: suffix path query 2: path query 3: XPath query Query time for Shakespeare, Protein and Auction data sets

BLAS: An Efficient XPath Processing System 24 Scalability The performance of D-labeling, Split and Push up for the suffix path query

BLAS: An Efficient XPath Processing System 25 Conclusion n P-labeling scheme is proposed to evaluate suffix path queries efficiently. n BLAS combines P-labeling and D-labeling to evaluate XPath queries. n BLAS is more efficient because the queries translated from XPath queries require: u fewer disk accesses u fewer joins n Experiments show the effectiveness of BLAS

BLAS: An Efficient XPath Processing System 26 n [1]J. Clark and S. DeRose. XML Path language (XPath), November 1999. http://www.w3.org/TR/xpath.http://www.w3.org/TR/xpath n [13] D. DeHaan, D. Toman, M. Consens, and M. T. Ozsu. A comprehensive XQuery to SQL translation using dynamic interval encoding. In Proceedings of SIGMOD, 2001. n [26] J.-K. Min, M.-J. Park, and C.-W. Chung. XPRESS: A queriable compression for XML data. In Proceedings of SIGMOD, 2003.

BLAS: An Efficient XPath Processing System 27 Thank you! Question ?

BLAS: An Efficient XPath Processing System Zhimin Song Advanced Database System Professor: Dr. Mengchi Liu.

Similar presentations

Presentation on theme: "BLAS: An Efficient XPath Processing System Zhimin Song Advanced Database System Professor: Dr. Mengchi Liu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

BLAS: An Efficient XPath Processing System Zhimin Song Advanced Database System Professor: Dr. Mengchi Liu.

Similar presentations

Presentation on theme: "BLAS: An Efficient XPath Processing System Zhimin Song Advanced Database System Professor: Dr. Mengchi Liu."— Presentation transcript:

Similar presentations

About project

Feedback