Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin.

Similar presentations


Presentation on theme: "A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin."— Presentation transcript:

1 A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

2 Introduction  XML languages, such as XQuery, XSLT and SQL/XML, employ XPath as the search and extraction language. XPath expressions often define complicated navigation, resulting in expensive query processing, especially when executed over large collections of documents. As a result, optimization of XPath expressions is vital to efficiently process XML queries.  This paper proposes a framework for exploiting materialized XPath views to process XML queries. It develops an XPath matching algorithm to determine when such views can be used to answer a user query containing XPath expressions.

3 Introduction  There are two main problems associated with answering XML queries using materialized XPath views. First, an XPath query containment is required to make sure that a view can be used to answer a query. Second, a compensation expression needs to be constructed, that would compute the query result using the information available from the view.

4 Introduction  We address the XPath query containment problem with an XPath matching algorithm. The containment problem was shown to be NP complete for a restricted subset of XPath. We propose an efficient polynomial-time matching algorithm which is sound and works in most practical cases.  The algorithm is based on the observation that a total node mapping from view nodes to query nodes implies containment for conjunctive XPath expressions. We build on the same observation, but extend it to a more functional subset of XPath that includes value predicates, disjunction and the axes allowed in XQuery.

5 XPath Matching Algorithm  Here we present an algorithm to decide if a given XPath view can be utilized in a user query. The algorithm finds tree mappings between the view and the query expression trees, and records them in a match structure. If a mapping exists then the view can potentially be used to evaluate the XPath expression in the user query.  In the remainder of this presentation, we first introduce our XPath representation, then describe the basic algorithm, followed by an extension to handle comparison predicates.

6 XPath Representation  We represent XPath expressions as labeled binary trees, called XPS trees. An XPS node is labeled with its axis and test, where axis is either the special "root", or one of the 6 axes allowed in XQuery: "child", "descendant", "self", "attribute", "descendant-or-self", or "parent". The test is either a name test, a wildcard test, or a kind test.  The first child of an XPS node is called predicate, and it can be a conjunction (and), a disjunction (or), a comparison operator (, ≥, =, ≠, eq, ne, lt, le, gt, ge), a constant, or an XPath Step (XPS) node. The second child, called next, points to the next step, and is always an XPS node.

7 Examples of Xpath and XPS Tree

8 XPS Tree Construction  To consider the special need for construction of XPS tree, we define the structure of XPS node including Axis, Test, and Sequence Number field using Java from scratch without using any auxiliary tool. Meanwhile using this node structure to express the predicate including conjunction (and), a disjunction (or), a comparison operator (, ≥, =, ≠, eq, ne, lt, le, gt, ge) and a constant.  To deal with the complication of the XPath expression, We use recursion method to parse the Xpath expression to build subtrees that can handle the complicate predicate condition. For Example: the predicate of an XPath step may contain a nested XPath expression; multiple conjunction, disjunction or comparison operators appearing in predicate conditions.

9 Example of XPS Tree Structure  View= //order[lineitem/@price>130 and @count>100 and itemNum=10] root root 1 descendant order 2 predicate AND 0 predicate > 0 child lineitem 5 attribute price 6 predicate 130 0 predicate AND 0 predicate > 0 attribute count 11 predicate 100 0 predicate = 0 child itemNum 15 predicate 10 0 Here to handle multiple conjunction and the predicate of an XPath step containing a nested XPath expression

10 Example of XPS Tree Structure  View = "//order[@price>150 and discount[count>10 and itemNum=100] and ordeNum=101]"; root root 1 descendant order 2 predicate AND 0 predicate > 0 attribute price 5 predicate 150 0 predicate AND 0 child discount 9 predicate AND 0 predicate > 0 child count 12 predicate 10 0 predicate = 0 child itemNum 16 predicate 100 0 predicate = 0 child ordeNum 21 predicate 101 0 Here we handle nested predicate condition and multiple “And”.

11 Basic Matching Algorithm  The algorithm described here traverses both the view and the query expression trees and computes all possible mappings from XPS nodes of the view to XPS nodes of the query expression, in a single top-down pass of the view tree.  The table below summarizes the basic algorithm in terms of the four functions used. Every function of the table evaluates to Boolean. The algorithm is invoked by the initial call matchStep(v.root, q.root), and there exists a match if this call evaluates to true. The first rule whose condition is satisfied is fired for each function.

12 Basic Matching Algorithm

13  Using this algorithm, we can handle the situation where the query expression can be more restrictive than the view definition.  For example, the view V = //* [@*], which contains all XML element nodes which have an attribute, can be used to evaluate Q1 = //order/lineitem[@price and discount] as shown in Figure. Dotted lines denote the mapping.  Rule 1.2 says that if one disjunct of pred is mapped by a node v, then v also has to map to some node in the other disjunct of Q. For example, the same V of Figure cannot be used to evaluate the expression Q=//order/lineitem[@price or price], which asks for lineitem nodes, which have either a price attribute or a price element.

14 Basic Matching Algorithm  When the view node contains a \descendant“ axis, we need to keep looking for matches down in the tree, even if the current query expression node matches (rules 1.3). For example, in Figure 2, we will try to map XPS2(//*) to XPS5 (//order), XPS6(/lineitem), and XPS9(/discount).

15 Basic Matching Algorithm

16 Recording the Match  Why do we need to record matching? basic matching algorithm may generate exponential number of tree mappings. Example: View: //a//a…//a Query: /a/a../a Might have distinct tree mappings Redundant information matchStep() function would be called multiple times with same parameters.

17 Recording the Match  What to record? Match matrix structure row: XPS nodes of query column: XPS nodes of view cell: pair of view and query XPS tree node. possible values: “empty”, “true”, “false” Directed edges between cells Meaning: Representing the context in mapping Explanation: edge (i,j)  (k,l) means matchStep(, ) was called from matchStep(, ) This is a DAG (Directed Acyclic Graph): matching process is in top-down manner.

18 Recording the Match  Benefits: Reduce run-time to polynomial It is possible to handle comparison predicates

19 Example of Match Matrix view=//order/*[@price] query = //order[LineItem/@price] root 1 //order 2 /* 3 @price 4 view tree root 5 //order 6 /LineItem 7 @price 8 query tree

20 Example of Match Matrix Q V root 5//order 6//LineItem 7@price 8 root 1 True //order 2 True False /* 3 True @price 4 True

21  Comparison Predicates Format L op R L and R could be either XPS nodes or a constant. op could be,>=,=  Some logic constrains V =//order/* [@price > 60] Q =//order[lineitem/@price > 30] View can not be used to answer Query. Handling Comparison Predicates

22  Example V =//order/* [@price > 60] root 1 //order 2 /* 3 60 6 view tree @price 5 > 4

23  Two Types of Comparisons Local predicates n op constant (@price >60) Intra-document join n op m (@price > @salary)  Normalization Local predicates Replace comparison operator with sub-tree from n Add comparison into filter list Intra-document join Replace comparison operator with “and” Handling Comparison Predicates

24  Examples of local predicate: V =//order/* [@price > 60] root 1 //order 2 /* 3 60 6 view tree @price 5 > 4 @price 5 Filter: “5”, ”>”, “60”

25 Handling Comparison Predicates  Examples of intra-document join: V =//order/*[@price > @salary] root 1 //order 2 /* 3 @salary 6 view tree @price 5 > 4 @salary 6@price 5 AND 4 Filter: “5”,”>”,”6”

26 Handling Comparison Predicates  Check restriction for local predicates V: …@price>60… Q: …@price>40… Fail to pass “restriction check”  Check restriction for intra-document join V: …salary <= bonus[christmas] Q: …salary and bonus[christmas] Fail to pass “restriction check”

27 Matching Intradocument Joins  Clean up in intra-document join Remove all dangling edges for which either source or target matrix cell is not set to true. Remove orphan node matches, i.e., matrix cells with value true that do not have at least one incoming edge, are set to false.

28 Matching Intradocument Joins  Clean up example: V =//a[@b > @c]; Q =//a[@b > @c]/a[@b and @c] root 1 //a 2 AND 3 view tree @c 5@b 4 root 6 //a 7 AND 8 query tree @c 10@b 9 AND 12 @c 14@b 13 /a 11 Filter: 4,>,5Filter: 9,>,10

29 Matching Intradocument Joins Clean-up example continue: Matching matrix Q V root 6//a 7@b 9@c 10/a 11@b 13@c 14 root 1 T //a 2 T T @b 3 T T @c 4 T T

30 Complexity of the Algorithm  Size of the match matrix is O( | V | * | Q | ) V and Q are the number of XPS nodes in the view and query expressions respectively.  Number of edges in DAG is O( |V| * |Q| 2 ) Each matrix cell can have at most |Q| incoming edges (by construction an edge (i, j)  (l, k) may exist only if v i is the parent of v l ). Thus the number of edges in the DAG is O( |V| * |Q| 2 )

31 Complexity of the Algorithm  The cost of constructing the matrix is also polynomial The matchStep function has only |V| * | Q | distinct sets of parameters By definition of a match matrix, the same pair of nodes cannot be matched more than once In the worst case (rule 1.3) a function call may expand into | Q | function calls Thus the algorithm runs in O( |V | * | Q | 2 ) time.

32 References  A Framework for Using Materialized XPath Views in XML Query Processing Andrey Balmin Fatma Ä Ozcan Kevin S. Beyer Roberta J. Cochrane Hamid Pirahesh IBM Almaden Research Center, San Jose CA  S. Chaudhuri, R. Krishnamurthy, S. Potamianos, and K. Shim. Optimizing queries with materialized views. In Proceedings of ICDE, pages 190-200, 1995.  A. Deutsch and V. Tannen. Containment and integrity constraints for xpath. In Proceedings of KRDB, 2001.  J. Goldstein and P. Larson. Optimizing queries using materialized views: A practical, scalable solution. In Proceedings of SIGMOD, Santa Barbara, CA, 2001.

33 References  A. Y. Levy, A. O. Mendelzon, Y. Sagiv, and D. Srivastava. Answering queries using views. In Proceedings of PODS, pages 95-104, 1995.  G. Miklau and D. Suciu. Containment and equivalence for an xpath fragment. In Proceedings of PODS, pages65-76, 2002.

34 Questions? & Thank you


Download ppt "A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin."

Similar presentations


Ads by Google