Presentation is loading. Please wait.

Presentation is loading. Please wait.

2003. DSRG, Worcester Polytechnic Institute1 Beyond the Rainbow: —— A Pot of Gold ala XML Database Projects WPI DSRG GROUP.

Similar presentations


Presentation on theme: "2003. DSRG, Worcester Polytechnic Institute1 Beyond the Rainbow: —— A Pot of Gold ala XML Database Projects WPI DSRG GROUP."— Presentation transcript:

1 2003. DSRG, Worcester Polytechnic Institute1 Beyond the Rainbow: —— A Pot of Gold ala XML Database Projects WPI DSRG GROUP

2 2003.DSRG, Worcester Polytechnic Institute2 Motivation XML is new, and here to stay … Universal flexible representation of data De facto standard for information exchange XQuery is useful, and here to stay… Powerful query language for XML De facto standard for XML querying Plentitude of relevant new issues …

3 2003.DSRG, Worcester Polytechnic Institute3 Internet XML Paradigm EVE-Middleware XML 1 XML 3 RDB 4 XML 5 RDBL 6 XML n XML 2 WWW: global scale distributed information system for sharing data XML Queries And Updates – searching – querying – integrating – restructuring – updating

4 2003.DSRG, Worcester Polytechnic Institute4 Internet What We Aim For… EVE-Middleware XML 1 RDB 3 XML 4 RDB5 XML 6 XML n XML 2 XML Data Management Middleware Technology – efficient – flexible – scalable – lightweight – resource-sensitive – adaptive

5 2003.DSRG, Worcester Polytechnic Institute5 WPI Project Directions RAINBOW: Exploiting RDB for XML management: Algebraic-XQuery processing XCube: Flexible XML Mapping Tool: Flexible loading/extracting XML to RDB via XQuery Updating Virtual XML Views: Update decomposition and trigger-propagation MASS: Native XML Query Engine: Multi-axis compressed order-preserving XML storage

6 2003.DSRG, Worcester Polytechnic Institute6 WPI Project Directions XCache: XML Query Caching: Cache containment and query rewriting Materialized XML View Maintenance: Incremental algebraic maintenance strategy SAXE: XML Incremental Updating & Evolution: Lightweight updating by update query rewriting RAINDROP: XQuery-based Stream Processing: Adaptive on-fly multi-subscription optimization

7 2003. DSRG, Worcester Polytechnic Institute7 THE RAINBOW PROJECT

8 2003.DSRG, Worcester Polytechnic Institute8 XML meets Relational DBs XML 1)Emerging web standard 2)Flexible data representation 3)Powerful query language Relational Database 1) Widely used to store business data 2) Efficient, reliable, secure DBMS 3) Mature query processing techniques The look and feel of an XML query system with maturity and technology support of RDB +

9 2003.DSRG, Worcester Polytechnic Institute9 TCP/IP Illustrated Data on the Web Running Example Data on the Web002 TCP/IP Illustrated001 TitleBid 34.95002 65.95001 PriceBid 001 65.95 002 34.95 001 TCP/IP Illustrated 002 Data on the Web FOR $t IN document(“prices.xml”)/book/title RETURN $t TCP/IP Illustrated 65.95 Data on the Web 34.95 FOR $book IN document(“dxv.xml”)/book/row $prices IN document(“dxv.xml”)/prices/row WHERE $book/bid = $prices/bid RETURN $book/title, $prices/price

10 2003.DSRG, Worcester Polytechnic Institute10 XML Default View Fixed and straight-forward mapping scheme. Paperback Texas Holdem' David Sklansky, Straight Flush Paperback Dracula Bram Stoker … XML Default View

11 2003.DSRG, Worcester Polytechnic Institute11 Generic Loading FUNCTION Q1($root){ LET $maintag := gettag($root) RETURN FOR $actual IN $root/* LET $innertag := gettag($actual) RETURN IF ($actual/element()) THEN Q1($actual) ELSE IF ($actual/text()) THEN ELSE "" } Knowledge of schema of XML document to be loaded helps to reduce unnecessary parts.

12 2003.DSRG, Worcester Polytechnic Institute12 Instantiation XML Schema Schema XQuery Expression XQuery Expressio n (recursive) XQuery Expression XQuery Expressio n (flat) Instantiator Generic loading XQuery expression recursive. + It works for every XML document. - Many recursive calls return no value. - Unnecessary FOR-loops, IF-clauses, and getName()-fct.

13 2003.DSRG, Worcester Polytechnic Institute13 Instantiation (Example) FUNCTION Q1($root){ FOR $book IN $root/BOOK RETURN FOR $name IN $book/AUTHOR/NAME RETURN } Short, non-recursive, more efficient … But: XML schema dependent! (First Step of CLOCK mapping scheme) Instantiated Loading Query

14 2003.DSRG, Worcester Polytechnic Institute14 Flexible Mapping Management RDB Default View Reverser RDB Default View XQuery (Load) XQuery (Extract) XML’ H XML RelationRelation’ G g F f 1 2

15 2003.DSRG, Worcester Polytechnic Institute15 XCube in a Nutshell Easy-to-use (no new transformation language). Flexible (interchangeable XQuery expressions). Adaptable (to workload, data specifics, …). General (Schema independent). Extendable (with new mapping schemes). Tunable (Loading manager). 1.Generic XQuery loading expressions 2.XQuery load expression instantiation

16 2003.DSRG, Worcester Polytechnic Institute16 Tuples XAT Merger SQL Generator RDBMS User XQuery SQL XAT Generator XAT Executor User Query Results in XML XAT Optimizer XAT View XQuery XAT Decorrelator View XAT User XAT Architecture XAT XAT: XML Algebra Tree Virtual XML Document View XAT User XAT XAT Virtual XML Document XML Document

17 2003.DSRG, Worcester Polytechnic Institute17 XQuery-Level Optimization XAT - XML Algebra Tree Model XAT Algebraic Query Plan Optimization XAT Query Plan Reduction

18 2003.DSRG, Worcester Polytechnic Institute18 T $t col3 Agg S ”prices.xml” R0  R0, book/title $ t  col3 1: 2: 3: 6: 7: User Query User XML Algebra Tree (XAT) FOR $t IN document(“prices.xml”)/book/title RETURN $t XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT

19 2003.DSRG, Worcester Polytechnic Institute19  $book, title col10 T col5 col4 S “dxv.xml” R1  R1, /book/row $book  Agg T [col10][col12] col5 S “dxv.xml” R3  R3, /prices/row $prices  $prices, price col12 11: 12: 22: 23: 25: 14: 15: 20: 21: 31:  $book, bid col6  $prices, bid col7 27: 28:  col6=col7 26: View Query View XML Algebra Tree (XAT) FOR $book IN document(“dxv.xml”)/book/row $prices IN document(“dxv.xml”)/prices/row WHERE $book/bid = $prices/bid RETURN $book/title, $prices/price XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT

20 2003.DSRG, Worcester Polytechnic Institute20 T $t col3 Agg  col4 R0  R0, book/title $ t  col3 1: 2: 3: 6: 7:  $book, title col10 T col5 col4 S “dxv.xml” R1  R1, /book/row $book  Agg T [col10][col12] col5 S “dxv.xml” R3  R3, /prices/row $prices  $prices, price col12 11: 12: 22: 23: 25: 14: 15: 20: 21: 31:  $book, bid col6  $prices, bid col7 27: 28:  col6=col7 26: User Query View Query Merged XML Algebra Tree (XAT) XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT

21 2003.DSRG, Worcester Polytechnic Institute21 XQuery-Level Optimization XML Algebra Representation: XAT XAT Query Plan Rewriting XAT Query Plan Reduction

22 2003.DSRG, Worcester Polytechnic Institute22 XAT Rewrite Query Optimization at Logic Algebra Level. Goals: Redundancy Elimination. Computation Pushdown. Technique: Equivalence Rewrite Rules. Heuristics: Pushdown Navigates Remove Construction of Intermediate Result Combine Multiple Operators. XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT

23 2003.DSRG, Worcester Polytechnic Institute23 T $t col3 Agg  col4 R0  R0, book/title $ t  col3 1: 2: 3: 6: 7:  $book, title col10 T col5 col4 S “dxv.xml” R1  R1, /book/row $book  Agg T [col10][col12] col5 S “dxv.xml” R3  R3, /prices/row $prices  $prices, price col12 11: 12: 22: 23: 25: 14: 15: 20: 21: 31:  $book, bid col6  $prices, bid col7 27: 28:  col6=col7 26: User QueryView Query Before Navigation Pushdown

24 2003.DSRG, Worcester Polytechnic Institute24  31:  $book, bid col6 27:  R1, /book/row $book 14: S “dxv.xml” R1 15:  $book, title col10 23:  $prices, bid col7 28:  R3, /prices/row $prices 20: S “dxv.xml” R3 21:  $prices, price col12 25: T $t col3 Agg  col3 1: 2: 3:  R0, book/title $t 6:  col6=col7 26: T col5 R0 11: Agg 12: T [col10][col12] col5 22: After Navigation Pushdown View QueryUser Query

25 2003.DSRG, Worcester Polytechnic Institute25  31:  $book, bid col6 27:  R1, /book/row $book 14: S “dxv.xml” R1 15:  $book, title col10 23:  $prices, bid col7 28:  R3, /prices/row $prices 20: S “dxv.xml” R3 21:  $prices, price col12 25: T $t col3 Agg  col3 1: 2: 3:  R0, book/title $t 6:  col6=col7 26: T col5 R0 11: Agg 12: T [col10][col12] col5 22: Remove any Taggers? View QueryUser Query

26 2003.DSRG, Worcester Polytechnic Institute26  col3 1: T $t col3 2: Agg 3:  col6=col7 26: After Tagger Cancel Out  31:  $book, bid col6 27:  R1, /book/row $book 14: S “dxv.xml” R1 15:  $book, title $t 23:  $prices, bid col7 28:  R3, /prices/row $prices 20: S “dxv.xml” R3 21:  $prices, price col12 25: View Query User Query

27 2003.DSRG, Worcester Polytechnic Institute27 After Making Join JOIN col6=col7 31:  $book, bid col6 27:  R1, /book/row $book 14: S “dxv.xml” R1 15:  $book, title $t 23:  $prices, bid col7 28:  R3, /prices/row $prices 20: S “dxv.xml” R3 21:  $prices, price col12 25:  col3 1: T $t col3 2: Agg 3: View QueryUser Query

28 2003.DSRG, Worcester Polytechnic Institute28 XQuery-Level Optimization XML Algebra Representation: XAT XAT Query Plan Rewriting XAT Query Plan Reduction

29 2003.DSRG, Worcester Polytechnic Institute29 XAT Cleanup Why: SQL engine cannot reduce redundancy in XQuery. How: Data Redundancy by Schema Cleanup Each operator produced, consumed and modified some columns. Minimum schema is then computed. Tree Redundancy by Unused Operator Cutting Cutting matrix generation. Required columns analysis. Operator cutting. XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT

30 2003.DSRG, Worcester Polytechnic Institute30 XAT Operator Properties Produced Desc: New column generated by operator. Example: , S, T Consumed Desc: Columns required by operator. Example: ,  Modified Desc: Columns modified by operator. Example: , , 

31 2003.DSRG, Worcester Polytechnic Institute31 Schema Computation {R3}{}{R3}2021 {R3, $prices}{R3}{$prices}2820 {R3, $prices, col7}{$prices}{col7}2528 {R3, $prices, col7, col12}{$prices}{col12}3125 {R1}{}{R1}1415 {R1, $book}{R1}{$book}2714 {R1, $book, col6}{$book}{col6}2327 {R1, $book, col6, $t}{$book}{$t}3123 {R1, $book, col6, $t, R3, $prices, col7, col12} {col6, col7}{}331 {R1, $book, col6, $t, R3, $prices, col7, col12} {} 23 {col3, R1, $book, col6, $t, R3, $prices, col7, col12} {$t}{col3}12 {}1 Old SchemaConsumedProducedParentNode  $book, title $t S “dxv.xml” R1  R1, /book/row $book  col6=col7 S “dxv.xml” R3  R3, /prices/row $prices  $book, bid col6  $prices, bid col7  $prices, price col12 T $t col3 Agg  col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3:

32 2003.DSRG, Worcester Polytechnic Institute32 Schema Computation NodeParentProducedConsumedMinimum Schema 1{}{col3} 21 {$t}{col3} 32{} {$t} 313{}{col6, col7}{$t} 2331{$t}{$book}{col6, $t} 2723{col6}{$book}{$book, col6} 1427{$book}{R1}{$book} 1514{R1}{}{R1} 2531{col12}{$prices}{col7, col12} 2825{col7}{$prices}{$prices, col7} 2028{$prices}{R3}{$prices} 2120{R3}{}{R3}  $book, title $t S “dxv.xml” R1  R1, /book/row $book  col6=col7 S “dxv.xml” R3  R3, /prices/row $prices  $book, bid col6  $prices, bid col7  $prices, price col12 T $t col3 Agg  col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3:

33 2003.DSRG, Worcester Polytechnic Institute33 Schema Computation {R3} P2021 {$prices} CP2820 {$prices, col7} CP2528 {col7, col12} CP3125 {R1} P1415 {$book} CP2714 {$book, col6} CP2327 {col6, $t} CP3123 {$t} CC331* {$t} 23 {col3} CP12 C1 New Schema R3$pricescol12R1$bookcol7col6$tcol3Parent()# *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted. Intuition: Don’t keep anything that’s not used later.  $book, title $t S “dxv.xml” R1  R1, /book/row $book  col6=col7 S “dxv.xml” R3  R3, /prices/row $prices  $book, bid col6  $prices, bid col7  $prices, price col12 T $t col3 Agg  col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3:

34 2003.DSRG, Worcester Polytechnic Institute34 Schema Cleanup Result Node Original SchemaMinimum Schema 1 {col3, R1, $book, col6, $t, R3, $prices, col7, col12}{col3} 2 {col3, R1, $book, col6, $t, R3, $prices, col7, col12}{col3} 3 {R1, $book, col6, $t, R3, $prices, col7, col12}{$t} 31 {R1, $book, col6, $t, R3, $prices, col7, col12}{$t} 23 {R1, $book, col6, $t}{col6, $t} 27 {R1, $book, col6}{$book, col6} 14 {R1, $book}{$book} 15 {R1} 25 {R3, $prices, col7, col12}{col7, col12} 28 {R3, $prices, col7}{$prices, col7} 20 {R3, $prices}{$prices} 21 {R3}

35 2003.DSRG, Worcester Polytechnic Institute35 XAT Cleanup Schema Cleanup Each operator produced, consumed and modified some columns. Minimum schema is then computed. Unused Operator Cutting Cutting matrix generation. Required columns analysis. Operator cutting.

36 2003.DSRG, Worcester Polytechnic Institute36 Cutting Matrix Purpose: Get rid of unused operators. Equations: Propagation of modified Propagation of required Identify cuttable node.

37 2003.DSRG, Worcester Polytechnic Institute37 Matrix Computation #Parent()col3$tcol6col7$bookR1col12$pricesR3Cut? 1C 21PC 32--------- 31*3CC 2331PC 2723PC 1427PC 1514P 2531PC 2825PC 2028PC 2120P *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.  $book, title $t S “dxv.xml” R1  R1, /book/row $book JOIN col6=col7 S “dxv.xml” R3  R3, /prices/row $prices  $book, bid col6  $prices, bid col7  $prices, price col12 T $t col3 Agg  col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3:

38 2003.DSRG, Worcester Polytechnic Institute38 Matrix Computation (Cont.1) P2021 CP2820 CP2528 CP3125 P1415 CP2714 CP2327 CP3123 CC331* -------M-23 CP12 RRRR1 Cut?R3$pricescol12R1$bookcol7col6$tcol3Parent()# *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.  $book, title $t S “dxv.xml” R1  R1, /book/row $book JOIN col6=col7 S “dxv.xml” R3  R3, /prices/row $prices  $book, bid col6  $prices, bid col7  $prices, price col12 T $t col3 Agg  col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3: Intuition: Give me only the required columns in order to get the final result.

39 2003.DSRG, Worcester Polytechnic Institute39 Matrix Computation (Cont. 2) #Parent()col3$tcol6col7$bookR1col12$pricesR3Cut? 1RRRR 21PC 32-M------- 31*3CCX 2331PC 2723PCX 1427PC 1514P 2531PCX 2825PCX 2028PCX 2120PX *We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.  $book, title $t S “dxv.xml” R1  R1, /book/row $book JOIN col6=col7 S “dxv.xml” R3  R3, /prices/row $prices  $book, bid col6  $prices, bid col7  $prices, price col12 T $t col3 Agg  col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3:

40 2003.DSRG, Worcester Polytechnic Institute40 XAT after Cutting  $book, title $t S “dxv.xml” R1  R1, /book/row $book Agg  col3 14: 15: 23: 1: 3: T $t col3 2:  $book, title $t S “dxv.xml” R1  R1, /book/row $book JOIN col6=col7 S “dxv.xml” R3  R3, /prices/row $prices  $book, bid col6  $prices, bid col7  $prices, price col12 T $t col3 Agg  col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3: Reduced To

41 2003.DSRG, Worcester Polytechnic Institute41 SQL Generated  $book, title $t S “dxv.xml” R1  R1, /book/row $book Agg  col3 14: 15: 23: 1: 3: T $t col3 2:  $book, title $t S “dxv.xml” R1  R1, /book/row $book JOIN col6=col7 S “dxv.xml” R3  R3, /prices/row $prices  $book, bid col6  $prices, bid col7  $prices, price col12 T $t col3 Agg  col3 27: 28: 14: 15: 20: 21: 31: 23: 25: 1: 2: 3: SELECT “$book”.title as “$t”, “$book”.bid as “col6”, “$prices”.price as “col12”, “$prices”.bid as “col7” FROMbook “$book”, prices “$prices” WHERE“col6”=“col7” SELECT “$book”.title as “$t”, FROMbook “$book”, XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT

42 2003.DSRG, Worcester Polytechnic Institute42 XQuery-Level Optimization XML Algebra Representation: XAT XAT Query Plan Rewriting XAT Query Plan Reduction

43 2003.DSRG, Worcester Polytechnic Institute43 Performance Gain in Execution

44 2003.DSRG, Worcester Polytechnic Institute44 Rainbow Engine Overhead XAT Merger SQL Generator User XQuery XAT Generator XAT Executor XAT Optimizer XAT View XQuery XAT Decorrelator XAT View XAT User XAT XAT View XAT User XAT XAT Rewrite XAT Cleanup Total: 32,522 ms Ack.: XQuery using Kweelt Parser

45 2003.DSRG, Worcester Polytechnic Institute45 http://davis.wpi.edu/dsrg/rainbow https://sourceforge.net/projects/rainbow-engine/

46 2003.DSRG, Worcester Polytechnic Institute46 Related Work XPERANTO[VLDBJ2000]: XQGM vs. XAT Xquery Views over RDB, Extension by UDFs for XML features SilkRoute[IEEE2001(24:2)]: Xquery Views over RDB, Generate SQL Efficiently AGORA[VLDB2000]: Syntax level rewriting.

47 2003.DSRG, Worcester Polytechnic Institute47 Summary Efficient XQuery Processing XML Algebra Tree (XAT) XAT Optimization: Rewrite by using equivalent rules Cleanup Schema cleanup Operator cutting Prototype system implementation.


Download ppt "2003. DSRG, Worcester Polytechnic Institute1 Beyond the Rainbow: —— A Pot of Gold ala XML Database Projects WPI DSRG GROUP."

Similar presentations


Ads by Google