Presentation is loading. Please wait.

Presentation is loading. Please wait.

5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.

Similar presentations


Presentation on theme: "5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University."— Presentation transcript:

1 5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University

2 5/2/20052 Outline Introduction to XML Storage Query Languages Indexing Query Processing Conclusions

3 5/2/20053 From Documents to Data HTML describes presentation References S. Abiteboul, P. Buneman, D. Suciu, Data On The Web, 2000.

4 5/2/20054 From Documents to Data (cont.) XML (eXtensible Markup Language) describes content S. Abiteboul P. Buneman D. Suciu Data On The Web 2000

5 5/2/20055 XML Syntax Element a piece of text bounded by matching tags: D. Suciu elements can be nested Attribute (name, value) pair: … alternative ways to represent data XML document has a single root element Well-formed XML documents tags must nest properly attributes must be unique

6 5/2/20056 XML Hierarchical Data Model XML is ordered references book author titleyear 2000Data on the Web S. Abiteboul P. Buneman D. Suciu author … …

7 5/2/20057 Specifying the Structure DTD (Document Type Definition): A context- free grammar <!DOCTYPE references [ ]>

8 5/2/20058 Specifying the Structure (cont.) XML Schema in XML format element names and types associated locally includes primitive data types a superset of DTDs Valid XML documents the document must be well-formed the element names must follow the structure specified in a DTD file or XML schema file

9 5/2/20059 Storing XML Documents Designing a specialized system for storing native XML data Using a DBMS to store the whole XML documents as text fields Using a DBMS to store the document contents as data elements It must support the XML’s ordered data model

10 5/2/200510 Using a DBMS: Relational DTD Schema-aware An element that can occur at most once in its parent is stored as a column of the table representing its parent ParentIDIDTEXT ParentIDIDtitleyear35“S. Abiteboul” 23“Data on The Web”“2000”36“P. Buneman” 24……37“D. Suciu” The book tableThe author table references book author titleyear 2000Data on the Web S. Abiteboul P. Buneman D. Suciu author … …

11 5/2/200511 Using a DBMS: Edge Schema-less A single table is used to store the entire document Each node is assigned an ID in depth first order references book author titleyear 2000 Data on the Web S. Abiteboul P. Buneman D. Suciu author … … 1 2 34 root

12 5/2/200512 Using a DBMS: Edge (cont.) SourceIDtagordinalTargetIDData 1reference12NULL 2book13NULL 2book24NULL 3author10“S. Abiteboul” 3author20“P. Buneman” 3author30“D. Suciu” 3title40“Data on The Web” 3year50“2000” The edge table

13 5/2/200513 XPath XPath is a language for addressing parts of an XML document. XPath uses path expressions to select nodes or node- sets that satisfy certain patterns specified in the expression. The names in the XPath expression are element or attribute names in the XML document. A single slash (/) before an element specifies that the element must appear as a direct child of the previous (parent) element. A double slash (//) specifies that the element can appear as a descendant of the previous element at any level.

14 5/2/200514 XPath Examples references selects all the child nodes of the references element /references selects the root element references //book selects all book elements no matter where they are in the document references//book selects all book elements that are descendant of the references element /references/* selects all the child nodes of the references element //book/title | //book/author selects all the title AND author elements of all book elements

15 5/2/200515 XQuery XQuery is a language for finding and extracting elements and attributes from XML documents XQuery uses XPath expressions, but has additional constructs. FLWR stands for the four main clauses of XQuery: FOR LET WHERE RETURN For example: for $b in doc(“references.xml")//book where count ($b/author) > 0 return { $b/title } { for $a in $b/author return $a }

16 5/2/200516 Indexing In order to find all occurrences of a query pattern, efficient mechanisms are needed for Determining the ancestor-descendant relationship between XML elements Accessing XML values Two types of indexes that can help determine the ancestor-descendant relationships: Structural index: It can reduce the time for traversing the XML hierarchy. Numbering scheme: It encodes each element by its positional information within the XML hierarchy. Using such a numbering scheme, the ancestor-descendant relationship between a pair of elements can be determined quickly.

17 5/2/200517 Structural Index DataGuides [Goldman97]: Every label path of the source graph has exactly one data path instance in its DataGuide. C D C D C D A B B C D C D AB C D AB

18 5/2/200518 Structural Index (cont.) 1-Index [Milo99]: Grouping together nodes if they have the same set of incoming paths D CABAB D D C A B D CA B D data graph1-indexdataguide

19 5/2/200519 Structural Index (cont.) Covering indexes [Kaushik02] Forward and Backward Index (F&B-Index) Add inverse edges to the graph Compute the 1-index (or DataGuide) for the modified graph The size of F&B-Index is too large. To reduce the size: only useful tags are indexed do not index all idref edges (XPath gives a higher priority to tree edges and // matches only tree edges) exploit local similarity (short paths only) restrict tree deepth

20 5/2/200520 Numbering Scheme Dewey Decimal Coding [ Tatarinov02 ] references book author titleyearauthor 1 1.1 1.2 1.1.11.1.21.1.31.1.41.2.1 title 1.2.2

21 5/2/200521 Numbering Scheme (cont.) Inserting new elements references book author titleyearauthor 1 1.1 1.2 1.1.11.1.21.1.31.1.41.1.51.2.1 title 1.2.2 new element nodes that require renumbering

22 5/2/200522 Numbering Scheme (cont.) Preorder and postorder [Dietz82] (preorder, postorder) x is an ancestor of y iff x occurs before y in the preorder traversal and after y in the postorder traversal. references book author titleyearauthor (1,10) (2,6)(2,6) (8,9)(8,9) (3,1)(3,1)(4,2)(4,2)(5,3)(5,3)(6,4)(6,4)(7,5)(7,5)(9,7)(9,7) title (10,8)

23 5/2/200523 Numbering Scheme (cont.) Various interval schemes (docno, begin:end, level) [Zhang01] The begin and end positions can be generated by doing a depth-first traversal of the tree and sequentially assigned a number at each visit. (preorder, size) [Li01] Size is an arbitrary integer larger than the total number of the current descendants. (lowest_post, postorder) [Agrawal89] Lowest_post is the lowest postorder number of its descendants.

24 5/2/200524 Query Processing To find all occurrences of a query pattern in XML documents. Navigation-based approach It computes results by analyzing an input document one tag at a time. The query is represented as a non-deterministic finite automaton (NFA) [Diao03] Index-based approach It uses precomputed indexes to answer the query.

25 5/2/200525 Holistic Twig Join [Bruno02] Indexes string: (doc, left, level) element: (doc, left: right, level) Query: A//B//C A1 B1 A2 B2 C1 data SASA SBSB SCSC A1 A2 B1 B2 C1 stack encoding A1 B1 C1 A1 B2 C1 A2 B2 C1 query results

26 5/2/200526 Count (A // B // C) XPath Query Sequential Data SASA SBSB SCSC Read A-node’s Count = 1 B-node’s Count = 0 C-node’s Count = 0 Count Operation [Chen04]

27 5/2/200527 Count (A // B // C) XPath Query Sequential Data SASA SBSB SCSC Read A-node’s Count = 2 B-node’s Count = 0 C-node’s Count = 0 Count Operation (cont.)

28 5/2/200528 SASA SBSB SCSC A(2) null Count (A // B // C) XPath Query Sequential Data Count Operation (cont.) Read A-node’s Count = 0 B-node’s Count = 1 C-node’s Count = 0

29 5/2/200529 SASA SBSB SCSC A(2) null Count (A // B // C) XPath Query Sequential Data Count Operation (cont.) Read A-node’s Count = 0 B-node’s Count = 2 C-node’s Count = 0

30 5/2/200530 Count (A // B // C) XPath Query Sequential Data Count Operation (cont.) Query result is 2 * 2 = 4. SASA SBSB SCSC A(2) null B(2) Read A-node’s Count = 0 B-node’s Count = 0 C-node’s Count = 1

31 5/2/200531 Count Operation (cont.) Count (A // B // C) XPath Query SASA SBSB SCSC A(2) null B(2) Read A-node’s Count = 0 B-node’s Count = 0 C-node’s Count = 0 Sequential Data

32 5/2/200532 Count Operation (cont.) Count (A // B // C) XPath Query SASA SBSB SCSC A(2) null Read A-node’s Count = 0 B-node’s Count = 1 C-node’s Count = 0 Sequential Data B(2) –1 = 1

33 5/2/200533 Count Operation (cont.) Count (A // B // C) XPath Query SASA SBSB SCSC A(2) null Read A-node’s Count = 0 B-node’s Count = 0 C-node’s Count = 0 Sequential Data

34 5/2/200534 Count Operation (cont.) Count (A // B // C) XPath Query Read A-node’s Count = 1 B-node’s Count = 0 C-node’s Count = 0 Sequential Data SASA SBSB SCSC A(2) –1 = 1

35 5/2/200535 Count Operation (cont.) Count (A // B // C) XPath Query SASA SBSB SCSC Read A-node’s Count = 0 B-node’s Count = 0 C-node’s Count = 0 Sequential Data

36 5/2/200536 Future Work Version management Materialized views Cache management Aggregate query processing Streaming data processing

37 5/2/200537 References [Agrawal89] R. Agrawal et al., “Efficient management of transitive relationships in large data and knowledge bases,” SIGMOD, 1989. [Bruno02] N. Bruno et al., “Holistic twig joins: Optimal XML pattern matching,” SIGMOD, 2002. [Chen04] Yaw-Huei Chen and Ming-Chi Ho, “Aggregate query processing of streaming XML data,” ICS, 2004. [Christophides03] V. Christophides et al., “On labeling schemes for the semantic web,” WWW, 2003. [Diao03] Y. Diao et al., “Path sharing and predicate evaluation for high- performance XML filtering,” ACM TODS, 2003. [Dietz82] P.F. Dietz, “Maintaining order in a linked list,” ACM Symposium on Theory of Computing, May 1982. [Goldman97] R. Goldman and J. Widom, “DataGuides: Enabling query formulation and optimization in semistructured databases,” VLDB, 1997.

38 5/2/200538 References (cont.) [Kaushik02] R. Kaushik et al., “Covering indexes for branching path queries,” SIGMOD, 2002. [Li01] Q. Li and B. Moon, “Indexing and querying XML data for regular path expressions,” VLDB, 2001. [Milo99] T. Milo and D. Suciu, “Index structures for path expressions,” Proc. of the Int’l Conf. on Database Theory, 1999. [Tatarinov02] I. Tatarinov et al., “Storing and querying ordered XML using a relational database system,” SIGMOD, 2002. [Tian02] F. Tian et al., “The design and performance evaluation of alternative XML storage strategies,” SIGMOD Record, March 2002. [Zhang01] C. Zhang et al., “On supporting containment queries in relational database management systems,” SIGMOD, 2001.


Download ppt "5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University."

Similar presentations


Ads by Google