OrientX: an Integrated, Schema-Based Native XML Database System

OrientX: an Integrated, Schema-Based Native XML Database System
Meng Xiaofeng, Wang Xiaofeng, Xie Min, Zhang Xin, Zhou Junfeng School of information, Renmin University of China WISA2006

Introduction OrientX means: Original RUC IDKE Native XML Database
RUC: Renmin University of China IDKE: Institute of Data and Knowledge Engineering Native XML DataBase: Exposing a logical model of storing and retrieving XML documents. (non Native XML DataBase: for example, based on relation database)

Outline Architecture and Features Storage and data management
Indexing Schema Query processing Conclusion and Future Work

Architecture This picture is the architecture of our system,
Index manager model: construct index and access index. Execute engine are take charge of importing and exporting xml document, executing xquery and Xpath. Xml Schema describe the xml document, also put some restrict on xml documents.

Features Full support to XML Schema
Supporting XQuery1.0 and XPath2.0 Data Model Various native storage techniques Path index and value index Multi-Query Processing strategies based on native storage. System can get full information from xml schema, and it can used to index, validation of XQuery and so on. Four storage strategies in our system, cluster or non cluster. Two different index. One is navigation and another is based on xml algebra.

Different storage granularities
Document: do not decompose the document, build index on it to direct the structure. Query complexity and efficiency are restricted by the power of index. Sub tree: decompose the document into sub trees according to storage space partition. Persistent the structure in the tree. save space Node: decompose the document into nodes sequence , each node corresponding to a type (element, attribute, …). May use too many links to persistent relation between nodes Document: for example: store xml document on relation database as a type blob. Sub tree: the size of sub tree often as close to physics page size as possible.. Node : one node is a record.

Storage Techniques in OrientX
Element-based SubTree-based Document-based Depth-first DEB DSB DB Broad-first BEB BSB Clustered CEB CSB Horizontal is the granularity Vertical is the methods of traverse xml tree. Like DEB, but each record is a sub-tree. The size of sub tree is close to physical page size One node is a record, through preorder traversing tree One element is a record, but all node with the same tag name will be clustered-stored. Akin to DSB, each record is a sub tree. But all sub trees with the same structure are clustered store. Implemented techniques are marked in red

Example-- Element based
DEB CEB r t1 a1 a2 r t1 l1 f1 a1 l2 f2 a2 l1 f1 l2 f2 DEB: preorder traverse tree and stored it when element end. CEB: like DEB, but all element with the same tag name clustered store. Source doc r t1 l1 l2 f1 f2 a1 a2

Example-- Subtree based
Proxy node (virtual node) t1 a1 a2 f2 l1 f1 l2 Also have Proxy node DOC r r t1 a1 a2 t1 a1 a2 left picture: preorder traverse tree, and if the sub-tree size near physics page size, then generate a record to store this sub-tree. right picture: base on schema will divide document tree to many sub-tree. and the sub-tree with the same structure clustered store. Left picture: suppose every physical page’s capacity is 5 nodes, In a depth-first traversal, (t1, l1, f1, a1) are the first four encountered nodes, generate a sub-tree containing them with a virtual root (grey) as they are not in the same sub-tree. And then, the left four nodes (l2, f2, a2, r) form another sub-tree without virtual root as they are in the same tree rooted at (r). Right picture: in this XML tree, node of type [a] can occur multi-times under node (r), and a has descendants, so sub-tree rooted at node of type [a] are treated as storage sub-tree element, and document root (r) is another sub-tree root. Notice that nodes (a1, a2) occurred twice in CSB. l1 f1 l2 f2 l1 f1 l2 f2 DSB(Depth-first sub-tree based) CSB (clustered sub-tree based)

SUPEX: Index Architecture Path index

Features of SUPEX Constructed based on DTD,Schema
Integrating path index with value indexes Supporting Twig query efficiently Supporting label path expressions ( bib//author) Supporting the evaluation of value-based condition predicates (//author[firstname = “jone”])

Query processing Navigation strategy Supporting XPath2.0 and XQuery1.0
Combine continuous steps in one XPath into a single path. Reform syntax tree into reduced execution plan. Introducing the pipeline operator to XQuery process. There are two different implementation of XQuery executer. One is based on Navigation , in OrientX version 2.0 And another is based on algebra .

Operators in Navigation
Currently, Navigation Containing 13 operators: Step CondTreeNode Path ForVarBind LetVarBind FLWR EleConstructor AttrConstructor BuiltInFun IfThenElse Quanlify SetOpt SortBy

General Steps to process XQuery
XQuery Query Parser and Translator Initial Query plan optimizer This flowchart is the procedure of our XQuery process. optimized Query plan Evaluator Engine

The query plan With the above XQuery example, the corresponding Query plan is the tree on right . It a tree structure translated from the Query diriectly.

Conclusion and Future Work
OrientX is an integrated, schema-based native XML database system. It implements storing and querying xml data. Future work: XQuery optimization. Xml Update and Other XQuery processing engine.

Q&A Thanks Welcome to our website http://idke.ruc.edu.cn
to obtain more information about OrientX

OrientX: an Integrated, Schema-Based Native XML Database System

Similar presentations

Presentation on theme: "OrientX: an Integrated, Schema-Based Native XML Database System"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

OrientX: an Integrated, Schema-Based Native XML Database System

Similar presentations

Presentation on theme: "OrientX: an Integrated, Schema-Based Native XML Database System"— Presentation transcript:

Similar presentations

About project

Feedback