Dr. N. MamoulisAdvanced Database Technologies1 Topic 8: Semi-structured Data In various application domains, the data are semi-structured; the database.

Slides:



Advertisements
Similar presentations
Ting Chen, Jiaheng Lu, Tok Wang Ling
Advertisements

Jiaheng Lu, Ting Chen and Tok Wang Ling National University of Singapore Finding all the occurrences of a twig.
XML to Relational Database Mapping
XML: Extensible Markup Language
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
By Daniela Floresu Donald Kossmann
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
1 XEM: Managing the Evolution of XML Documents Author: Hong Su, Diane Kramer. Li Chen, Kajal Claypool and Elke A. Rundensteiner Presented by: Li Shuhong.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
Database Systems and XML David Wu CS 632 April 23, 2001.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
4/15/2002Bo Du 1 - Bo Du, April 15, XML - QL A Query Language for XML.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
4/20/2017.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
XML-QL A Query Language for XML Charuta Nakhe
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Company LOGO OODB and XML Database Management Systems – Fall 2012 Matthew Moccaro.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Querying Structured Text in an XML Database By Xuemei Luo.
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.
Database Systems Part VII: XML Querying Software School of Hunan University
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Dr. N. MamoulisAdvanced Database Technologies1 CSIS7101: Course summary Spatial data Spatiotemporal data Multimedia and Time-series data Data mining I.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.
1 Structural Join Algorithms – Examples Key property: x is a descendant (resp., child) of y iff x.docId = y.docId & x.StartPos < y.StartPos
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Extensible Markup Language
XML Storage We must upgrade to XML. Everyone is talking about it. Well, that is going to cost us XXX on YYY and earn us WWW on ZZZ.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava,
XML: Extensible Markup Language
Presented by Sandhya Rani Are Prabhas Kumar Samanta
Probabilistic Data Management
XML Query Processing Yaw-Huei Chen
XML indexing – A(k) indices
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
Introduction to XML IR XML Group.
Presentation transcript:

Dr. N. MamoulisAdvanced Database Technologies1 Topic 8: Semi-structured Data In various application domains, the data are semi-structured; the database schema is loose-defined. Semi-structured data need specialized management methods. The most characteristic example is XML data where the elements (tags) define the semantics of the information. xxx yyy zzz 7

Dr. N. MamoulisAdvanced Database Technologies2 XML and HTML XML (like HTML) is a subset of SGML In HTML the tags serve the primary purpose of describing how to display a data item. On the other hand, XML tags describe the data itself. Tags are called elements in XML. This means that a program receiving an XML document can interpret it in multiple ways, can filter the document upon its content, can restructure it for a different application, etc.

Dr. N. MamoulisAdvanced Database Technologies3 Example of HTML document Course Details CSIS7101 Detailed (tentative) schedule 1 Introduction Introduction by the instructor [ slides in pdf ]* attribute=“value” text

Dr. N. MamoulisAdvanced Database Technologies4 Example of XML document The Selfish Gene Richard Dawkins Timbuktu attribute=“value” text

Dr. N. MamoulisAdvanced Database Technologies5 XML and Databases XML is becoming a standard for information exchange over the internet. Since actual data are stored in XML documents, it should be possible to query the data. Here comes the role of databases: How should we organize and query XML data?

Dr. N. MamoulisAdvanced Database Technologies6 XML and Databases (cont’d) Solution 1: Use specialized storage methods, query languages and query evaluation techniques for semi-structured data. Solution 2: Represent XML data in relational tables, transform queries to SQL, and use the mature relational DB technology.

Dr. N. MamoulisAdvanced Database Technologies7 More on XML – Document Type Descriptors DTDs (Document Type Descriptors) define (and control) the schema of XML documents for a specific application. Thus, now the structure of these documents is not free, but should conform to the DTD. The DTD can help define a relational schema for the class of documents that conform to it.

Dr. N. MamoulisAdvanced Database Technologies8 Example of a DTD 0 or more times 0 or 1 time reference to existing ID anything can be nested under address

Dr. N. MamoulisAdvanced Database Technologies9 More on XML – Queries Queries on XML data are described by the structural relationships of elements, attributes and values. Several XML Query Languages have been proposed. XML-QL, Lorel, UnQL, XQL, XPath, XQuery,...

Dr. N. MamoulisAdvanced Database Technologies10 More on XML – Query Example Find the last name of the author of book “the selfish gene” WHERE The Selfish Gene $l IN db.xml CONSTRUCT $l

Dr. N. MamoulisAdvanced Database Technologies11 XML data represented as graphs An XML document can be represented as a node-labeled graph. The labels of the graph are element tags, attribute names and values. Most documents can be represented by trees. The edges that transform a tree to a graph come from ID references.

Dr. N. MamoulisAdvanced Database Technologies12 Example of a tree representation book booktitle The Selfish Gene author id dawkins name firstname lastname RichardDawkins address cityzip Timbuktu99999

Dr. N. MamoulisAdvanced Database Technologies13 Example of a graph representation The Importance of Evergreen John Smith Smithsville Jones Jonesville

Dr. N. MamoulisAdvanced Database Technologies14 Example of a graph representation (cont’d) article title The Importance of Evergreen author id smith name firstname John Smith address lastname Smithsville author id jones name lastname Jones address Jonessville contactauthor authorid

Dr. N. MamoulisAdvanced Database Technologies15 XML Query types Queries with absolute path expressions. These queries retrieve paths where the first element is the ROOT of the document. Example: find all books written by author with lastname=“Smith” book/author/name/lastname/Smith

Dr. N. MamoulisAdvanced Database Technologies16 XML Query types Queries with simple path expressions. These queries retrieve paths where the first element can be any element of the document. Example: find all items written by author with lastname=“Smith” //author/name/lastname/Smith anything can be before author tag

Dr. N. MamoulisAdvanced Database Technologies17 XML Query types Queries with regular path expressions. These queries retrieve paths where the not all elements on the path are specified. Example: find all documents with an “author” element with a descendant “Smith” in the graph //author//Smith any path can be between author and Smith

Dr. N. MamoulisAdvanced Database Technologies18 XML Query types In general, other symbols may be used to denote the distance between the path elements. Example: find all documents with an “author” followed by one element, then one or none elements, and then by “Smith”. //author/_ /?/Smith exactly one element should be here one or no element could be here

Dr. N. MamoulisAdvanced Database Technologies19 XML Query types Queries that match multiple regular path expressions. The paths are joined in a root element and the whole query is represented by a twig (small tree). Example: find the book with title “XML” written by an author with a descendant “Smith” in the graph book[/title/XML][//author//Smith] book title XML author Smith

Dr. N. MamoulisAdvanced Database Technologies20 Indexing and XML Query Processing Several storage schemes and indexes have been proposed for the queries discussed above. Some of them index the paths or subgraphs of the XML structures. Some decompose the information and flatten it into relational DB tables.

Dr. N. MamoulisAdvanced Database Technologies21 Path indexes for XML data If many documents exist, they are connected into a large graph by adding a common root. Then a structural summary of the XML graph is created. All the paths in the data graph are preserved into the summary graph. If we keep pointers to the original graph into the summary graph, then this becomes an index.

Dr. N. MamoulisAdvanced Database Technologies22 Path index example alldocuments (root) title book author name title book author nameaddress title article author nameaddress author name A. a graph of documents

Dr. N. MamoulisAdvanced Database Technologies23 Path index example alldocuments (root) title book authortitle article author nameaddress 1 2,8 3,9 4,6, ,18 16,1917 B. the 1-index name 5,7,1112 address The 1-index maintains information about all paths in the original graph

Dr. N. MamoulisAdvanced Database Technologies24 Path indexes for XML data The 1-index maintains information about all paths in the original graph. This makes the index very large (with size comparable to the data size). Therefore it is quite expensive to evaluate queries using this index. To address this problem an A(k)-index is proposed which indexes exactly only paths up to length k.

Dr. N. MamoulisAdvanced Database Technologies25 Bisimilarity Two nodes u, v are called bisimilar if: They have the same label. If u’ is the parent of u, then there is a parent v’ of v, such that u’,v’ are also bisimilar, and vice versa.

Dr. N. MamoulisAdvanced Database Technologies26 Bisimilarity Example Nodes 4 and 10 are bisimilar Nodes 10 and 15 are not bisimilar alldocuments (root) title book author name title book author nameaddress title article author nameaddress author name

Dr. N. MamoulisAdvanced Database Technologies27 Bisimilarity defines the 1-index Bisimilar nodes are stored in the same node in the summary index. alldocuments (root) title book authortitle article author nameaddress 1 2,8 3,9 4,6, ,18 16,1917 name 5,7,1112 address

Dr. N. MamoulisAdvanced Database Technologies28 Bisimilarity and the A(k) index In the A(k) index, the notion of k- bisimilarity is used: Two nodes u,v are 0-bisimilar, if they have the same label. Two nodes u,v are k-bisimilar, if they have the same label and for every parent u’ of u, there is a parent v’ of v, such that u’ and v’ are (k-1)-bisimilar, and vice versa.

Dr. N. MamoulisAdvanced Database Technologies29 k-bisimilarity Example Nodes 5 and 16 are 1-bisimilar Nodes 6 and 15 are not 1-bisimilar alldocuments (root) title book author name title book author nameaddress title article author nameaddress author name

Dr. N. MamoulisAdvanced Database Technologies30 Bisimilarity and the A(k) index The A(k)-index stores exactly all paths of length k, or else: all k-bisimilar nodes in the data graph are stored in the same node in the index graph. This means that all incoming paths up to length k are encoded in the index.

Dr. N. MamoulisAdvanced Database Technologies31 A(k)-index example alldocuments (root) title book authortitle article author nameaddress 1 2,8 3,9 4,6, ,18 16,1917 A(3) and A(2)-index name 5,7,1112 address

Dr. N. MamoulisAdvanced Database Technologies32 A(k)-index example alldocuments (root) title book authortitle article author 1 2,8 3,9 4,6, ,18 A(1)-index name 5,7,11,16,19 12,17 address alldocuments (root) title book author article 1 2,8 3,9,14 4,6,10,15,18 A(0)-index name 5,7,11,16,1912,17 address 13

Dr. N. MamoulisAdvanced Database Technologies33 Using the A(k) index to search A Label Map is constructed together with the index, where each label points to its positions in the index. alldocuments (root) title book authortitle article author 1 2,8 3,9 4,6, ,18 A(1)-index name 5,7,11,16,19 12,17 address name author title article book

Dr. N. MamoulisAdvanced Database Technologies34 Evaluation of path queries Assume that a path query q of length ≤k is applied. The A(k) index can answer the query as follows. First the last label of q is found and the label map is used to find its positions in the index. Then the index is traversed backwards to complete the answer.

Dr. N. MamoulisAdvanced Database Technologies35 Evaluation of path queries (example) Query book/title alldocuments (root) title book authortitle article author 1 2,8 3,9 4,6, ,18 A(1)-index name 5,7,11,16,19 12,17 address name author title article book OK

Dr. N. MamoulisAdvanced Database Technologies36 Evaluation of path queries If the path of the query is longer than k, we may need to access the actual data. Thus, A(k)-index alone cannot be used to answer the query in this case. This is because if we traverse the index backwards we may find false positive paths that actually do not exist in the graph. Paths that share information are grouped to decrease the potential cost.

Dr. N. MamoulisAdvanced Database Technologies37 Evaluation of path queries (2 nd example) Query book/author/name path has length 2>1 alldocuments (root) title book authortitle article author 1 2,8 3,9 4,6, ,18 A(1)-index name 5,7,11,16,19 12,17 address name author title article book

Dr. N. MamoulisAdvanced Database Technologies38 Problems of path indexes They are appropriate only for simple path queries up to a certain length. Therefore if a query has branches or regular path expressions the index cannot provide exact answers, but the actual data have to be accessed. Also these indexes have high storage and update cost.

Dr. N. MamoulisAdvanced Database Technologies39 Storing and indexing XML data in relational databases We can decompose the structural information into tables and use them to answer queries. This reduces the volume of data that need to be accessed for a single query and we can use off-the-shelf query processing and optimization techniques. On the other hand, we may need expensive joins during query processing

Dr. N. MamoulisAdvanced Database Technologies40 A decomposition model for XML data The storage model indexes the elements and text of the documents by their position in the graph. If the structures are trees, this representation can help to answer queries fast. On the other hand, for graphs the positions of the elements many times cannot help fast query evaluation because of recursion and other problems the incur.

Dr. N. MamoulisAdvanced Database Technologies41 Encoding elements, attributes and values based on their positions. The position of each element/attribute occurrence is represented as a 3-tuple (Document-id, StartPos:EndPos, LevelNum) Values (text) is encoded using (Document-id, StartPos, LevelNum): Document-id is the id of the document that contains the element StartPos is the number of words from the beginning of the document until the start of the element EndPos is the number of words from the beginning of the document until the end of the element LevelNum is the nesting depth of the element

Dr. N. MamoulisAdvanced Database Technologies42 Encoding example The Selfish Gene Richard Dawkins Timbuktu 99999

Dr. N. MamoulisAdvanced Database Technologies43 Encoding Example book booktitle The Selfish Gene author id dawkins name firstname lastname RichardDawkins address cityzip Timbuktu99999 (1,1:27,1) (1,2:6,2) (1,3,3) (1,7:26,2) (1,8:9,3) (1,9,4) (1,10:17,3) (1,11:13,4) (1,12,5) (1,15,5) (1,14:16,4)(1,19:21,4) (1,22:24,4) (1,18:25,3) (1,20,5)(1,23,5)

Dr. N. MamoulisAdvanced Database Technologies44 Using the encoding to determine a structural relationship We can use the encoding to find fast the relationship between two elements (or between an element and a value). Element e 1 is an ancestor of element e 2 in the same document iff: e 1.DocumentId = e 2.DocumentId e 1.StartPos> e 2.StartPos && e 1.EndPos< e 2.EndPos (interval coverage) If the above hold and, in addition, e 1.LevelNum+1 = e 2.LevelNum, then e 1 is the parent of e 2.

Dr. N. MamoulisAdvanced Database Technologies45 Answering queries using the encoding Assume that all documents have been flattened to tables and the encoding is used to index the positions of each element and value in the documents. We store all information in a table: (ElementId, Document-id, StartPos:EndPos, LevelNum) The table is clustered by ElementId and sorted by (Document-id, StartPos).

Dr. N. MamoulisAdvanced Database Technologies46 Answering queries using the encoding (cont’d) Example ElementIdDocument-idStartPosEndPosLevelNum book11271 booktitle1262 author

Dr. N. MamoulisAdvanced Database Technologies47 Answering queries using the encoding (cont’d) The query is broken into binary parent-child or ancestor descendant relationships. Example: book[/title/XML][//author//Smith] Broken to: book/title title/XML book//author author//Smith book title XML author Smith

Dr. N. MamoulisAdvanced Database Technologies48 Answering queries using the encoding (cont’d) Each binary query is executed as a join, and their results are “stitched” together to formulate the results of the whole query. Example: book/author/address book/author: (2,4),(2,6),(8,10) author/address: (10,12),(15,17) book/author/title: (8,10,12) title book author name title book author nameaddress title article author nameaddress author name

Dr. N. MamoulisAdvanced Database Technologies49 How to process the binary joins Thus the “heart” of XML query processing is the algorithm that joins the elements table to retrieve the results for each individual query component. One method to process the binary join is to apply a merge-join algorithm, since the table is already sorted by Element,DocId,StartPos. Assume that the query is an A//D, where A is the ancestor element and D is the descendant element

Dr. N. MamoulisAdvanced Database Technologies50 How to process the binary joins. The tree-merge join algorithm Example: AList DList

Dr. N. MamoulisAdvanced Database Technologies51 Worst case for the tree-merge join algorithm Example:... a1a1 a2a2 anan d1d1 d 2n d 2n-1 d2d2 d n+1 dndn a1a1 a2a2... d1d1 d2d2 dndn d n+1... d 2n-1 d 2n... anan

Dr. N. MamoulisAdvanced Database Technologies52 How to process the binary joins. The tree-merge join algorithm The tree merge join may perform many passes to the “inner” DList table, one for each element in AList that mathes the elements there. In order to avoid this a stack-tree join algorithm is proposed. OBSERVATION: We can get all the join results by a depth-first traversal of the XML tree.

Dr. N. MamoulisAdvanced Database Technologies53 The stack-tree join algorithm The lists are merged together as before, but a stack is maintained to keep nested AList elements which are in the same path as the current element from DList. When a qualifying element in DList is found, all elements of AList in the stack are output.

Dr. N. MamoulisAdvanced Database Technologies54 Stack-tree join example a1a1 a2a2 d1d1 a3a3 d2d2 d3d3 d4d4 a4a4 d5d5 d6d6 a1a1 a2a2 a3a3 a4a4 d1d1 d2d2 d3d3 d4d4 d5d5 d6d6 AListDListStack a1a1 Output

Dr. N. MamoulisAdvanced Database Technologies55 Stack-tree join example a1a1 a2a2 d1d1 a3a3 d2d2 d3d3 d4d4 a4a4 d5d5 d6d6 a1a1 a2a2 a3a3 a4a4 d1d1 d2d2 d3d3 d4d4 d5d5 d6d6 AListDListStack a1a1 a2a2 Output a 1,d 1 a 2,d 1

Dr. N. MamoulisAdvanced Database Technologies56 Stack-tree join example a1a1 a2a2 d1d1 a3a3 d2d2 d3d3 d4d4 a4a4 d5d5 d6d6 a1a1 a2a2 a3a3 a4a4 d1d1 d2d2 d3d3 d4d4 d5d5 d6d6 AListDListStack a1a1 a3a3 Output a 1,d 1 a 2,d 1

Dr. N. MamoulisAdvanced Database Technologies57 Stack-tree join example a1a1 a2a2 d1d1 a3a3 d2d2 d3d3 d4d4 a4a4 d5d5 d6d6 a1a1 a2a2 a3a3 a4a4 d1d1 d2d2 d3d3 d4d4 d5d5 d6d6 AListDListStack a1a1 a3a3 Output a 1,d 1 a 2,d 1 a 1,d 2 a 3,d 2

Dr. N. MamoulisAdvanced Database Technologies58 Stack-tree join example a1a1 a2a2 d1d1 a3a3 d2d2 d3d3 d4d4 a4a4 d5d5 d6d6 a1a1 a2a2 a3a3 a4a4 d1d1 d2d2 d3d3 d4d4 d5d5 d6d6 AListDListStack a1a1 a3a3 Output a 1,d 1 a 2,d 1 a 1,d 2 a 3,d 2 a 1,d 3 a 3,d 3

Dr. N. MamoulisAdvanced Database Technologies59 Stack-tree join example a1a1 a2a2 d1d1 a3a3 d2d2 d3d3 d4d4 a4a4 d5d5 d6d6 a1a1 a2a2 a3a3 a4a4 d1d1 d2d2 d3d3 d4d4 d5d5 d6d6 AListDListStack a1a1 Output a 1,d 1 a 2,d 1 a 1,d 2 a 3,d 2 a 1,d 3 a 3,d 3 a 1,d 4

Dr. N. MamoulisAdvanced Database Technologies60 Stack-tree join example a1a1 a2a2 d1d1 a3a3 d2d2 d3d3 d4d4 a4a4 d5d5 d6d6 a1a1 a2a2 a3a3 a4a4 d1d1 d2d2 d3d3 d4d4 d5d5 d6d6 AListDListStack a1a1 Output a 1,d 1 a 2,d 1 a 1,d 2 a 3,d 2 a 1,d 3 a 3,d 3 a 1,d 4 a4a4 a 1,d 5 a 4,d 5 a 1,d 6 a 4,d 6

Dr. N. MamoulisAdvanced Database Technologies61 Comments on the stack-tree join algorithm The algorithm has better worst-case complexity than the tree-merge join algorithm. Both of them have two versions; one that outputs results sorted on AList elements and one that outputs results sorted on DList elements.

Dr. N. MamoulisAdvanced Database Technologies62 Limitation of the binary join algorithms They are used only for binary joins. If a query is complex and contains many binary relationships, many intermediate results have to be merged. Example: book[/title/XML][//author//Smith] Broken to: book/title title/XML book//author author//Smith book title XML author Smith

Dr. N. MamoulisAdvanced Database Technologies63 An extension of the stack-tree join algorithm The path-join and twig-join algorithms extend the basic stack-join algorithm for complex queries. The idea is the same, but multiple stacks are used to avoid merging the intermediate results. Path-join is appropriate for path queries only (e.g., book/author/name) Twig-join is appropriate for branching (tree) expressions. book title XML author Smith

Dr. N. MamoulisAdvanced Database Technologies64 Example of Path-Join Query a//b//c..at point c 3 a1a1 a2a2 b1b1 b2b2 c1c1 c2c2 b3b3 a3a3 b4b4 c5c5 c3c3 c4c4 StackA a1a1 a3a3 StackB b3b3 b4b4 Output c 3,b 4,a 3 c 3,b 4,a 1 c 3,b 3,a 1 c 1,b 2,a 1 c 2,b 2,a 1 c 3,b 3,a 3 is not an answer because b 3 points to a 1 in the next stack!

Dr. N. MamoulisAdvanced Database Technologies65 Path-join has optimal asymptotic cost for single-path queries, but if a query is a twig of multiple paths may produce many partial results which have then to be joined. The twig-join, joins these results at production time. Example: query a[//b/c][//d/e] two paths a/b/c two paths a/d/e only one twig a[/b/c][/d/e] a b c c e a d c d e a b c d e b

Dr. N. MamoulisAdvanced Database Technologies66 Twig Join The twig-join applies path-join at multiple paths at the same time. When at some node there are potential solutions for each path of the query, the algorithm waits for these results and waits for them to be computed. Then the results from each path are joined. a b c c e a d c d e a b c d e b partial result: a/b/c partial result: a/d/e merge result: a[/b/c][/d/e]

Dr. N. MamoulisAdvanced Database Technologies67 Limitations of the twig-join and stack-based methods It is useful for simple twigs only, but it is not trivial to extend it for arbitrary trees The encoding can be used for tree- structured XML data only. However, in many cases XML data are graphs. In this case the encoding (and also the stack-based algorithms) are not applicable. book title XML author SmithJones

Dr. N. MamoulisAdvanced Database Technologies68 Summary XML data are everywhere today, and efficient management and querying systems are needed. This is why today XML data management is one of the hottest research topics in DB. There are two streams for XML data management: Store XML data into native systems and use special indexing and querying methods. Trasform XML data into relational tables and use/adapt relational query algorithms.

Dr. N. MamoulisAdvanced Database Technologies69 References R. Kaushik et al., “Exploiting Local Similarity for Indexing Paths in Graph-Structured Data”, ICDE S. Al-Khalifa et al., “Structural Joins: A Primitive for Efficient XML Query Pattern Matching”, ICDE N. Bruno et al., “Holistic Twig Joins: Optimal XML Pattern Matching”, ACM SIGMOD J. Shanmugasundaram et al. “Relational Databases for Querying XML Documents: Limitations and Opportunities”, VLDB 1999.