Presentation is loading. Please wait.

Presentation is loading. Please wait.

Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.

Similar presentations


Presentation on theme: "Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude."— Presentation transcript:

1 Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude

2 Agenda Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

3 Ordered XML Data Model XML document as a tree structure - Relation as the ‘root’ - Nodes represent elements - Leaf nodes hold data values Document Type Descriptor - schema information about the XML document Order - a salient feature of an XML document

4 Significance of order in XML Order – Important from the point of view of reconstruction of XML documents - To ensure a lossless mapping from XML to RDB Performance issues - Choice of order dramatically affects performance - Enhances Efficient Translation of XML into SQL Order based functionality of XPath and XQuery XPath – a simple ‘path based’ query language XQuery – a complex query language based on XPath

5 Three dimensions of XML order Evaluation of Order based axes XPath expressions requiring document order 1. preceding 2. following Inter Element Order result set enforces document order among result set elements Intra Element Order For reconstruction, document order is important

6 Agenda Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

7 How is order encoded ? Order is preserved using a simple numbering scheme Each node is represented using a node_id Node-id is stored as a data value within the relation Numbering schemes capture enough information to reconstruct XML documents

8 Order Based Functionality of XPath XPath follows a step-by-step sequential evaluation, Each step is applied to a single node (context node) Result of each step is a set of nodes {node1,node2,..,node n} XPath syntax Path :: = /Step1/Step2/…/Step N Where each Xpath Step is defined as follows: Step :: = Axis :: Node-test Predicate* Axis selects a direction of navigation e.g. child :: title Would select all children that are ‘titles’

9 Order Based Functionality of XPath Axes – specify the direction of navigation in an XML document Up parent ancestor Down child descendant Left preceding Preceding-sibling Right following Following-sibling

10 Order Based Functionality of XQuery BEFORE operator - Return nodes from the first sequence that are before some node in the second sequence AFTER operator - Return nodes from the first sequence that are after some node in the second sequence XQuery supports range predicates - allows selection of a range of elements from a sequence e.g. /play/act[2 TO 4] Will return act #2,act #3, and act #4 in document order.

11 Global Order Encoding Methods Global Order Encoding Absolute positioning of nodes Best performance on queries - Query evaluation requires simple comparison between node positions Worst performance on updates, especially deletes play(1) title(2) text#(3) act(8)act(4) title(5)scene(7) text#(6)

12 Global Order Encoding (contd) Initially, sparse numbering is used for Global Order Encoding Sparse numbering brings down the cost of renumbering (on inserts/updates) Sparse numbering results in better performance on updates Makes intra-element and inter-element ordering easy (since global document order is easily available) Drawback - performs poorly on inserts (Local Order offers better performance for inserts/updates)

13 Global Order Renumbering Scenario Inserting a new element in an existing document causes many nodes to be renumbered In the adjoining figure, the highlighted nodes need to be renumbered (maximum in the global ordering scheme) play(1) title(2) text#(3) act(8) New Element act(4) title(5)scene(7)

14 Local Order Encoding Methods Local Order Encoding 1. Relative positioning of nodes 2. Best performance on updates 3. Worst performance on queries play(1) act(2)title(1)act(3) text(1) title(1)scene(2) text(1)

15 Local Order Encoding (continued….) How does local Order encoding reconstruct absolute path ? the relative position of a node is combined with the relative order of the parent this combined effect yields a vector that uniquely identifies the absolute position within the document (relative position of node) + (relative position of ancestor) = (absolute position of node in the document)

16 Local Order Renumbering Scenario As opposed to Global Order Encoding, Local Order requires a minimum number of nodes to be renumbered This is a major advantage, since it dramatically reduces the cost of inserts play(1) title(1) text#(1) act(2) New Element act(2) title(1)scene(2) scene(1)

17 Local Order Encoding (continued….) Incurs low overhead on updates Only “ following-sibling “ may require renumbering Drawbacks – Lack of global order information results in complex evaluations of following and preceding axes

18 Dewey Order Encoding Methods Dewey Order Encoding 1. Strikes a balance between Global and Local 2. Reasonable performance on updates and queries Play 1 title(1.1) text(1.1.1) act(1.2) title(1.1.2) act(1.3) scene(1.2.2) text(1.1.2.1)

19 Dewey Order Encoding Each path uniquely identifies absolute position of a node in a document Query processing is similar to that of Global order Only “ following-sibling “ may require renumbering Drawbacks – Extra space required to store paths from root to the node

20 Dewey Order Renumbering Scenario Renumbering required is more than that for Local Encoding, however much less than that for Global Encoding play title text# act New element act titlescene

21 Agenda Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

22 Shredding XML into Relations Schema-less Case Unknown schema of input XML documents Edge Approach - Each document is stored as a single table Schema-aware Case Schema of input XML documents is available Inlining – Single occurrence of child – store within parent relation Multiple occurrences – store as a new relation table

23 Inlining Inlining is an effective way of storing and querying XML provided the availability of Document Schema Inlining adapts to Global, Local and Dewey Orders. Every relation requires an additional column to encode document order storing order information of ‘inlined’ elements is unnecessary (Element position is determined from the position of parent and from the document schema)

24 Storing Order Information – Schema less case The Edge Approach Each relation is stored as a table Each tuple within the table represents a node Edge (id, parent_id, name, value) id synonymous to a primary key parent_id synonymous to the foreign key, provides link to the node’s parent name stores tag name of element value stores text value

25 Storing Order Information – Schema less case Edge approach adapts differently to Global, Local and Dewey Global Order Edge (id, parent_id, end_desc_id, path_id, value) end_desc_id – id of the last descendant of a node Local Order Edge (id, parent_id, sIndex, path_id, value) sIndex – sibling index of a node Dewey Order Edge (dewey, path_id, value) dewey – represents both order and ancestor information

26 Agenda Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

27 Query Translation for Global Order Edge (id, parent_id, end_desc_id, path_id, value) Translation of following/preceding Select nodes from Edge table where id value (context node) > end_descendant_id of context node Select nodes from Edge table where id value (context node) < end_descendant_id of context node Translation of following-sibling/ preceding-sibling Select (nodes in Edge table with id value > id of context node) AND (nodes with parent_id = parent_id of context node) Select (nodes in Edge table with id value < id of context node) AND (nodes with parent_id = parent_id of context node) Note : above expressions are NOT actual SQL statements

28 Query Translation for Local Order Edge (id, parent_id, sIndex, path_id, value) Translation of following-sibling/ preceding-sibling (Similar to Global and Dewey Order) Translation of following/preceding ( Complex Task !!!) 1. Compute all ancestors of context node – { anc} 2. Compute ancestors of following-sibling - { anc_sib} 3. Compute descendants of { anc_sib} Challenges: Without knowledge of XML schema, retrieving ancestors/descendants is a complex task Involves recursion

29 Query Translation for Dewey Order Edge (dewey, path_id, value) dewey column - stored as variable length byte string - replaces parent_id, and end_desc_id in Global Edge Table - Encodes parent and descendant information within the dewey path - Eliminates need to store parent_id and child_id Drawback: Storage overhead due to large number of bytes allocated to each component.

30 Query Translation in Inlining Essentially uses the same algorithm as that for Edge approach but with 2 extensions XML data can be spread across several tables therefore evaluating axes requires access to multiple tables as opposed to accessing just one Edge table Secondly translation algorithm does not use recursion (since the schema contains sufficient information about depth and postion of nodes) Drawback: Data is partitioned across many tables, too many tables to handle

31 Agenda Ordered XML Data Model Order Encoding Methods Shredding Ordered XML into Relations Translating XML queries to SQL Performance Evaluation

32 Storage Requirements Table 1: Indicates the storage requirements of Global, Local and Dewey Encoding Methods Order Scheme Edge Inlining Table SizeIndex SizeTable SizeIndex Size Global52.1 MB57.9 MB44.1 MB28.9 MB Local52.1 MB87.9 MB47.7 MB36.8 MB Dewey48.9 MB38.7 MB44.5 MB15.8 MB

33 Performance All experiments are based on the Shakespeare’s Plays dataset. Table 2: Test Queries Query Query Definition Q1 /play Q2 /play/act//speech Q3 /play/act/scene/speech Q4 /play/act/scene/speech[2] Q5 /play/act/scene/*[2] Q6 /play/act/scene/speech[1 TO 3] Q7 /play/act[2]/following:: speech Q8 /play/act/scene/speech/speaker/following-sibling::line[2] Q9 //act/scene/speech BEFORE /play/act[2]

34 Select and Reconstruct Modes XPath Queries essentially run in 2 different modes Select Mode : Result set contains only the ID’s of the nodes satisfying the XPath expression Reconstruct Mode: Entire XML fragments are extracted from the database in document order

35 Ordered Selection Edge Results X axis: Queries Y axis: Time (seconds)

36 Inlining Results

37 Reconstruction In reconstruct mode, XML documents need to be extracted from DB in document order Optimizers inability to pick the best plan rendered poor results On the other hand, using ‘tuned’ SQL queries yielded better results Note: Queries Q3,Q4,Q5,Q9 had a disastrous performance (way beyond the scope of indicated scale)

38 Performance Results based on experiments Global order is the most efficient order encoding method Followed by Dewey Order – second best performance Local Order uses sorting very often which degrades overall performance Typically Inlining performs better than Edge In general the XML document parsing overhead was more than XPath processing

39 Performance Conclusions based on results RDBMS efficiently supports ordered XML Global order is the best for query workloads Dewey Order is slightly less efficient than Global Order Best for a mix of queries and updates Schema Information makes Local Order a viable alternative Incomprehensiveness of Relational Optimizers to the hierarchical XML structure

40 Acknowledgements… Prof. Elke Rundensteiner Thank You …


Download ppt "Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude."

Similar presentations


Ads by Google