Presentation is loading. Please wait.

Presentation is loading. Please wait.

SilkRoute: A Framework for Publishing Rational Data in XML

Similar presentations


Presentation on theme: "SilkRoute: A Framework for Publishing Rational Data in XML"— Presentation transcript:

1 SilkRoute: A Framework for Publishing Rational Data in XML
Mary F. Fernandez Yana Kadiyska Dan Suciu Atsuyuki Morishima Wang Chiew Tan ACM Transactions on Database System 2002, volume 27, no. 4

2 Outline Introduction System Architecture XQuery to SQL Translation
XML Publishing Optimization Conclusion

3 Introduction

4 Introduction XML as language for data exchange between applications
Platform neutral Storage independent Semantics rich But most existing data are stored in RDBMS Conversion from RDBMS to XML should be General Selective Efficient

5 Design Goals General Two different data representations
Inherent complex conversion No public XML schema matches exactly relational counterparts Need for mapping multiple XML to same relational source under different situations XML Relational Nested Flat Unnormalized Normalized Public Proprietary

6 Design Goals Selective Efficient XML schema as virtual view
XQuery may only extract parts of the view Only materialize the data fragment needed Efficient Reduce computational overhead as much as possible Exploit optimization and evaluation engines provided by DB

7 Contributions XQuery to SQL Translation
Separate structure of XML output from computation which constructs it Introduce view forest which nodes are mapping from XML to SQL Optimization Problem for XML Publishing XQuery is mapped to one or more SQL Selecting the most efficient set

8 System Architecture

9 System Architecture 3-tier architecture Backend database for storage
SilkRoute as middleware Third party application (Client)

10 System Architecture SilkRoute transforms relational schema to canonical view forest

11 System Architecture Administrator uses XQuery to extract public XML view over canonical view, this gives public view forest

12 System Architecture Client application uses XQuery to data in interest, this gives application view forest

13 System Architecture Planner inside SilkRoute gets application view forest to produce SQLQuery (which then enters RDBMS) XMLTemplate (which then enters XML Generator)

14 System Architecture Tuples from RDBMS RDBMS is fed into the XMLTemplate inside generator Final XML output is then returned to the client

15 XQuery to SQL Translation

16 View Forest Internal representation of XML, a critical role in SilkRoute Each node carries XML label SQL query fragment For internal nodes, XML label is either element or attribute names; SQL fragment contains FROM and WHERE clauses For leaf nodes, XML label is atomic type; SQL fragment consists of SELECT clause

17 View Forest – SQL Association
A complete SQL query Cn is associated with each node n SELECT – if n is leaf node, use its SELECT clause; otherwise SELECT * FROM – concat all FROM clauses of n and its ancestor WHERE– concat all WHERE clauses of n and its ancestor

18 An Example To view product code, call CN1.2.6.1.1= SELECT p.code
FROM Clothing c, Problems p WHERE c.category = “outerwear” AND p.pid = c.pid

19 VFCA View Forest Composition Algorithm
Translate XQuery into view forest Used in view forest composer Operate on a subset of XQuery syntax, XQueryCore, which includes everything but NOT Recursive functions and operators (eg “before” , “after”, “is”, “isnot” Functions depending on XML document order

20 Q(V1, …, Vm) = VQ(I) for any I of the schema
VFCA Suppose I – instance of relational schema V(I) – view forest corresponding to the instance I Q(V(I)) – XQuery over the view forest V(I), the result return from this query is again a view forest, denoted by VQ(I) Input: An XQueryCore expression Q over view forests V1, …, Vm Output: A view forest VQ = VFCA(V1, …, Vm) such that Q(V1, …, Vm) = VQ(I) for any I of the schema

21 VFCA Traditional XQuery Evaluation VFCA
Inner XQuery is evaluated first to return results for outer XQuery Producing a number of intermediate XML outputs VFCA Only view forest is produced, eliminating intermediate results Data extraction is done once using the set of SQL realized from forest

22 XML Publishing Optimization

23 View Forest Evaluation
Steps in evaluating view forest Planner adds a key to each SQL fragment Planner partitions each view tree into one or more subtrees Each subtree is realized to a SQL The set of all SQL is submitted to relational engine XML generator consumes tuple streams and construct one single integrated relation XML output is produced by nesting tuples integrated relation and tagging each element

24 View Forest Evaluation

25 View Forest Evaluation
The set of SQL emitted from view forest evaluation is called a plan Given a view forest , there can be many different plans (exponential to the number of edges) Two extreme cases Emit a SQL for each node (fully partitioned strategy) Emit a SQL for each tree in the forest (unified strategy)

26 View Forest Evaluation
LEFT OUTER JOIN to connect a parent and child nodes’ SQL UNION to bind together sibling nodes’ SQL ORDER BY is needed to allow XML generator takes only one single pass over the tuple streams The tree can also be split into connected components SQL is then generated for each component

27 View Forest Evaluation
SELECT 1 AS L1, c.pid, L2, (c.price * Q.discount) as sale, Q.code, Q.comments FROM Clothing c LEFT OUTER JOIN ((SELECT 1 AS L2, d.pid AS pid, d.discount AS discount, null AS code, null AS comments FROM Discount d) UNION (SELECT 2 AS L2, p.pid AS pid, null AS discount, p.code AS code, p.comments AS comments FROM Problems p)) AS Q ON c.pid = Q.pid ORDER BY L1, c.pid, L2, Q.code

28 View Forest Evaluation
SELECT 1 as L1, c.pid, (d.discount * c.price) FROM Cloothing c LEFT OUTER JOIN (SELECT d.pid, d.discount FROM Discount d) ON c.pid = d.pid ORDER BY c.pid SELECT 2 as L1, c.pid, p.comments FROM Clothing c, Problems p WHERE c.pid = p.pid ORDER BY c.pid, p.code

29 Choosing Efficient Plan
As there exists exponential number of plans, SilkRoute resorts to heuristcs UNION and OUTER JOIN are costly operations Only choosing edges whose two associated queries are less expensive to evaluate together than separately Cost Difference= cost(qc) – (cost(q1) + cost(q2)) Where qc is the combined query of parent and child nodes And q1 and q2 are the parent query and child query respectively

30 Choosing Efficient Plan
Seek for cost estimates from relational source engine The estimate cost of a query q is obtained by cost(q, a, b) = a * evaluation_cost(q) + b * data_size(q) AND data_size(q) = f(|attrs(q)| * cardinality(q)) Relational engine provides evaluation_cost and cardinality a and b are weights

31 Results

32 Conclusion Proprietary commercial products have prospered in this field DB2 SQL Server Oracle Native XML database SilkRoute – free!! (work with MySQL/PostgreSQL)

33 Thank you!!


Download ppt "SilkRoute: A Framework for Publishing Rational Data in XML"

Similar presentations


Ads by Google