Presentation is loading. Please wait.

Presentation is loading. Please wait.

Querying XML streams in DB2 Vanja Josifovski Marcus Fontoura Knowledge Management Dept. IBM Almaden Research Center.

Similar presentations


Presentation on theme: "Querying XML streams in DB2 Vanja Josifovski Marcus Fontoura Knowledge Management Dept. IBM Almaden Research Center."— Presentation transcript:

1 Querying XML streams in DB2 Vanja Josifovski Marcus Fontoura Knowledge Management Dept. IBM Almaden Research Center

2 Agenda zMotivation and background ySQL/XML, XPath, XQuery, XML streams zTurboXPath (TXP) yTXP role in DB2 yDesign yEvaluation results zConclusions and future work zOther research areas

3 Motivation zCurrent trends in DBMS: yNew XML data type and a set of new XML- related operators yXML-enabled integration system yQueries over locally stored XML data and XML data streamed from external sources yWeb services and business-to-business applications zQuerying XML (streams) is essential

4 SQL/XML zSQL - Part 14 - XML related specifications (SQL/XML) yhttp://www.sqlx.org zNew XML data type zPublishing functions yXMLElement, XMLAttribute, XMLAgg zQuerying functions yXMLContains, XMLExtract, XMLTable (shred)

5 XPath zXML query language defined by W3C working group yOperates over a single document (no joins) ySingle extraction point, returning a node set zXPath examples //customer //customer[birthdate=‘07/25/1970’]/name //customer[address[state=‘CA’]]

6 XQuery (1/2) zAlso defined by W3C working group zExtends XPath for yProcessing several XML documents (joins) yConstructing XML results yCan return multiple node sets zFLWR (flower) is the most common type of expression

7 XQuery (2/2) zXQuery example FOR $c IN document("doc1.xml")//customer FOR $p IN document("doc2.xml")//profiles[cid=$c/cid()] LET $o := $c/order WHERE $o/date = '12/12/01' RETURN {$c/name} {$p/status} {$o/amount}

8 TurboXPath XQueryXSLT Streamed XML DB2 Web ServicesApplications XML Streams zApplications need to store XML documents in relational databases zas XML zas relational data zExample yWeb services

9 XML Storage TXP role in DB2 (1/3) XPath-based Interface XML Indexing TXP Textual XML TXP XML Streams Web Services TXP context XML Enabled Runtime xml fragments/ column values XPath/XQuery

10 TXP role in DB2 (2/3) zTable accesses in traditional query evaluation pipelines zReturns virtual tables of XML columns zExample FOR $c IN document("doc1.xml")//customer FOR $p IN document("doc2.xml")//profiles[cid=$c/cid()] LET $o := $c/order WHERE $o/date = '12/12/01' RETURN {$c/name} {$p/status} {$o/amount} FOR $c IN document("doc1.xml")//customer FOR $p IN document("doc2.xml")//profiles[cid=$c/cid()] LET $o := $c/order WHERE $o/date = '12/12/01' RETURN {$c/name} {$p/status} {$o/amount}

11 TXP role in DB2 (3/3) doc1//customer cidnameorder amountdate cidnameamount doc2//profile cidstatus cidstatus cid = cid statusnameamount XML generation operators statusnameamount

12 TurboXPath (TXP) zProcessing of multiple XPath expressions: yOne pass over the XML document yDocument order (pre-order) traversal yNo need to build a DOM tree in memory yResults emitted as found in the document zEfficient over: yXML streams yPre-parsed XML documents

13 TXP Features (1/2) zForward axes (child ‘/’, descendant ‘//’) zBackward axes (parent ‘..’ and ancestor) yQuery rewrites over streams zPredicates (Boolean and positional) y/a/b[c + d > 5 or.//e] y//a[5] - currently being implemented z‘Any’ node test y//contributors/*/name

14 TXP Features (1/2) zMultiple extraction points (tuples): y//customer[name and address and phone] return tuples ySubset of FOR-LET-WHERE over a single document yVery common case in the XQuery use doc zCurrent supports most of XPath 1.0 zRecursive XML input documents

15 TXP Architecture Expression parser SAX Event Handlers Tuple constructor/ Buffer management Input path expressions XML stream Output tuples TXP Evaluator Document Walker Pre-parsed XML (stored)

16 TXP internals: evaluator zParse tree - static yStructural tree yPredicate trees zWork array - dynamic yState of the evaluator yIn-lined tree document zBuffers yResults (copy or reference) yPredicate evaluation (copy) yDiscard when not needed Query: /a/b[$c + d > 5 or.//$e] r a b cde (c +d > 5 or e) r T 0 a T 1 b F 2 c T 3 d T 3 e T * work array parse tree... c1 c2 c3 e1 e2 output buffers... c1 c2 c3 d1 predicate buffers sibling group

17 Execution example (1) a c b (c and b) Parse tree r c1 b1... c1 b1... Input XML r F 0 r F 0 a F * r Query: //a[c]//b b buffers: none parse tree pointer document level status flag initial work array with one entry

18 c1 b1... c1 b1... Execution example (2) a c b (c and b) Parse tree r Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * ra Query: //a[c]//b b buffers: none

19 c1 b1... c1 b1... Execution example (3) a c b (c and b) Parse tree r Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c T 2 b F * rac Query: //a[c]//b b buffers: none

20 c1 b1... c1 b1... Execution example (4) a c b (c and b) Parse tree r Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c T 2 b F * rac /c Query: //a[c]//b b buffers: none

21 c1 b1... c1 b1... Execution example (4) a c b (c and b) Parse tree r Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c T 2 b F * rac /c b Query: //a[c]//b b buffers: 1.

22 c1 b1... c1 b1... Execution example (5) a c b (c and b) Parse tree r Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c T 2 b F * rac /c b Query: //a[c]//b b buffers: 1. b1 r F 0 a F * c T 2 b T * /b

23 c1 b1... c1 b1... Execution example (6) a c b (c and b) Parse tree r Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c T 2 b F * rac /c b Query: //a[c]//b b buffers: 1. r F 0 a F * c T 2 b T * /b r T 0 a T * /a

24 Recursive execution example (1) a c b (c and b) Parse tree r c1 b1 b2... c1 b1 b2... Input XML r F 0 r F 0 a F * r Query: //a[c]//b b buffers: none

25 Recursive execution example (2) a c b (c and b) Parse tree r c1 b1 b2... c1 b1 b2... Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * ra Query: //a[c]//b b buffers: none

26 Recursive execution example (3) a c b (c and b) Parse tree r c1 b1 b2... c1 b1 b2... Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c F 2 b F * c F 3 b F * raa Query: //a[c]//b b buffers: none

27 Recursive execution example (4) a c b (c and b) Parse tree r c1 b1 b2... c1 b1 b2... Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c F 2 b F * c F 3 b F * r F 0 a F * c F 2 b F * c T 3 b F * raac Query: //a[c]//b b buffers: none

28 Recursive execution example (5) a c b (c and b) Parse tree r c1 b1 b2... c1 b1 b2... Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c F 2 b F * c F 3 b F * r F 0 a F * c F 2 b F * c T 3 b F * raac /c Query: //a[c]//b b buffers: none

29 Recursive execution example (6) a c b (c and b) Parse tree r c1 b1 b2... c1 b1 b2... Input XML b buffers: 1. r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c F 2 b F * c F 3 b F * r F 0 a F * c F 2 b F * c T 3 b F * raac /c r F 0 a F * c F 2 b F * c T 3 b F * b Query: //a[c]//b b1 buffer open

30 Recursive execution example (7) a c b (c and b) Parse tree r c1 b1 b2... c1 b1 b2... Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c F 2 b F * c F 3 b F * r F 0 a F * c F 2 b F * c T 3 b F * r F 0 a F * c F 2 b T * c T 3 b T * raac /c/b r F 0 a F * c F 2 b F * c T 3 b F * b Query: //a[c]//b b1 buffer open b buffers: 1. b1

31 Recursive execution example (8) a c b (c and b) Parse tree r c1 b1 b2... c1 b1 b2... Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c F 2 b F * c F 3 b F * r F 0 a F * c F 2 b F * c T 3 b F * r F 0 a F * c F 2 b T * c T 3 b T * r T 0 a T * c F 2 b T * raac /c/b r F 0 a F * c F 2 b F * c T 3 b F * b/a Query: //a[c]//b b1 buffer open b1 buffer close b buffers: 1. b1

32 Recursive execution example (9) a c b (c and b) Parse tree r c1 b1 b2... c1 b1 b2... Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c F 2 b F * c F 3 b F * r F 0 a F * c F 2 b F * c T 3 b F * r F 0 a F * c F 2 b T * c T 3 b T * r T 0 a T * c F 2 b T * raac /c/b r F 0 a F * c F 2 b F * c T 3 b F * b b Query: //a[c]//b b1 buffer open b1 buffer close b2 buffer open b buffers: 1. b1 2. r T 0 a T * c F 2 b T * /a

33 Recursive execution example (10) a c b (c and b) Parse tree r c1 b1 b2... c1 b1 b2... Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c F 2 b F * c F 3 b F * r F 0 a F * c F 2 b F * c T 3 b F * r F 0 a F * c F 2 b T * c T 3 b T * r T 0 a T * c F 2 b T * raac /c/b r F 0 a F * c F 2 b F * c T 3 b F * b/a b /b Query: //a[c]//b b1 buffer open b2 buffer open/close b1 buffer close b buffers: 1. b1 2. b2

34 Recursive execution example (11) a c b (c and b) Parse tree r c1 b1 b2... c1 b1 b2... Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c F 2 b F * c F 3 b F * r F 0 a F * c F 2 b F * c T 3 b F * r F 0 a F * c F 2 b T * c T 3 b T * r T 0 a T * c F 2 b T * r T 0 a T * raac /c/b r F 0 a F * c F 2 b F * c T 3 b F * b/a b /b /a b1 buffer open b2 buffer open/close b2 removed b1 emitted, removed Query: //a[c]//b b1 buffer close b buffers: none

35 Recursive execution example (12) a c b (c and b) Parse tree r c1 b1 b2... c1 b1 b2... Input XML r F 0 r F 0 a F * r F 0 a F * c F 2 b F * r F 0 a F * c F 2 b F * c F 3 b F * r F 0 a F * c F 2 b F * c T 3 b F * r F 0 a F * c F 2 b T * c T 3 b T * r T 0 a T * c F 2 b T * r T 0 a T * r T 0 a T * c F 2 b F * raac /c/b r F 0 a F * c F 2 b F * c T 3 b F * b/a b /b /aa Query: //a[c]//b b buffers: none

36 Predicate evaluation zSeparate parse tree for the predicates, attached at an anchor node in the structure tree zEvaluated when anchor node closed zPredicate parse tree leafs point into the structure parse tree zPredicate tree is traversed and evaluated

37 r x a bc a b > = 5c r x ab c and a b > = 5c Predicate Pushdown zSingle value predicates can be evaluated before the anchor node is closed: zExample: /x[a>b and c = 5] and

38 Tuple construction using buffer annotations Input XML FragmentAncestor sets 2 12 ASt={1} ASt={11} FragmentAncestor sets 4 8 ASt={1}; ASa={3} ASt={1}; ASa={6,7} FragmentAncestor sets 5 9 ASt={1}; ASa={3} ASt={1}; ASa={7} g output buffers b/text() output buffers c/text() output buffers 9 ASt={1}; ASa={6} c/text() 5 9 Result 10 b/text() g 2 r t g a b c

39 Evaluation (i) zXMLContains (Boolean query)

40 Evaluation (ii) zXMLExtract (single column extraction)

41 Evaluation (iii) zXMLExtract (over large files, outside DB2)

42 Evaluation (iv) zXMLTable (varying the number of columns) zOptimizer should generate plans that benefit from that

43 Conclusions and Future Work zTXP efficiently evaluates XPath/XQuery subset over XML streams and pre-parsed XML yLow memory consumption yFast response time when compared to Xalan zTuple construction mechanism is useful for efficiently evaluating predicates and FLWR expressions zReturns values (copy) or references (XID) zWorks both over indexed (stored) XML and streamed XML using the same control structure zDeliverables for DB2: XMLWrapper, XML Storage, XML Loader/Shredder

44 Other research areas zSQL/XML zAutomatic generation of taxonomies yLotus Discovery Server zText indexing yIntranet Search

45 Automatic Taxonomy Generation (1/2) zUnified model for taxonomy yEach node (including intermediate nodes) model features that are common for the tree below zAll features (including stopwords) are modeled in the taxonomy zHybrid bottom-up and top-down scheme zAlgorithm yStart with an initial feasible solution (one level taxonomy) yMerge nodes as appropriate (needed) to discover more abstract topics ySplit nodes as appropriate (needed) to find more refined topics

46 Automatic Taxonomy Generation (2/2)


Download ppt "Querying XML streams in DB2 Vanja Josifovski Marcus Fontoura Knowledge Management Dept. IBM Almaden Research Center."

Similar presentations


Ads by Google