1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD.

Slides:



Advertisements
Similar presentations
Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
Advertisements

Lecture 24 MAS 714 Hartmut Klauck
Lecture 6 Nondeterministic Finite Automata (NFA)
Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2.
Inference of Concise DTDs from XML data Geert Jan Bex 1 Frank Neven 1 Thomas Schwentick 2 Karl Tuyls 3 1 Hasselt University and Transnational University.
2015/5/5 A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML Ning Zhang(University of Waterloo) Varun Kacholia(Indian Institute.
Applied Computer Science II Chapter 1 : Regular Languages Prof. Dr. Luc De Raedt Institut für Informatik Albert-Ludwigs Universität Freiburg Germany.
Introduction to Computability Theory
Introduction to Computability Theory
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
Insert A tree starts with the dummy node D D 200 D 7 Insert D
1 Finite Automata. 2 Finite Automaton Input “Accept” or “Reject” String Finite Automaton Output.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.
Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu Presented by Claudia Levin.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Fall 2006Costas Busch - RPI1 Non-Deterministic Finite Automata.
CMPT-825 (Natural Language Processing) Presentation on Zipf’s Law & Edit distance with extensions Presented by: Kaustav Mukherjee School of Computing Science,
Costas Busch - LSU1 Non-Deterministic Finite Automata.
Spring 2005Daria Barger – DB Seminar 1 Efficient Incremental Validation of XML Documents Denilson Barbosa Alberto O.Mendelson Leonid Libkin Laurent Mignet.
1 Non-Deterministic Finite Automata. 2 Alphabet = Nondeterministic Finite Automaton (NFA)
Module 9 Designing an XML Strategy. Module 9: Designing an XML Strategy Designing XML Storage Designing a Data Conversion Strategy Designing an XML Query.
Fundamental Structures of Computer Science March 02, 2006 Ananda Guna Binomial Heaps.
TECHNIQUES FOR OPTIMIZING THE QUERY PERFORMANCE OF DISTRIBUTED XML DATABASE - NAHID NEGAR.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
XML Typing and Query Evaluation. Plan We will put some formal model underlying XML Trees and queries on them – Keeping in mind the practical aspects but.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
REGULAR LANGUAGES.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
By: Er. Sukhwinder kaur.  What is Automata Theory? What is Automata Theory?  Alphabet and Strings Alphabet and Strings  Empty String Empty String 
XML Data Management 10. Deterministic DTDs and Schemas Werner Nutt.
Copyright © 2004 Pearson Education, Inc.. Chapter 26 XML and Internet Databases.
1 Structured Query Language (SQL). 2 Contents SQL – I SQL – II SQL – III SQL – IV.
Managing XML and Semistructured Data Lecture 13: XDuce and Regular Tree Languages Prof. Dan Suciu Spring 2001.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
View Materialization & Maintenance Strategies By Ashkan Bayati & Ali Reza Vazifehdoost.
MA/CSSE 473 Day 21 AVL Tree Maximum height 2-3 Trees Student questions?
Succinct Dynamic Cardinal Trees with Constant Time Operations for Small Alphabet Pooya Davoodi Aarhus University May 24, 2011 S. Srinivasa Rao Seoul National.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
Balanced search trees: trees (or 2-4) trees improve the efficiency of insertItem and deleteItem methods of 2-3 trees, because they are performed.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
1 Schema & Schema Integration Carsten Karl Dennis Schade Thorsten Dollmann.
Brian K. Strickland a ba Λ a aa b Λ -NFA for Regular Expression (aab)*(a + aba)*
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Pushdown.
1 Finite Model Theory Lecture 12 Regular Expressions, FO k.
Tree Automata First: A reminder on Automata on words Typing semistructured data.
Logic as a Query Language: from Frege to XML
Foundations of Computing Science
Non Deterministic Automata
@#? Text Search g ~ A R B n f u j u q e ! 4 k ] { u "!"
RE-Tree: An Efficient Index Structure for Regular Expressions
Non-Deterministic Finite Automata
Managing XML and Semistructured Data
Non-Deterministic Finite Automata
CSE322 Definition and description of finite Automata
On Inferring K Optimum Transformations of XML Document from Update Script to DTD Nobutaka Suzuki Graduate School of Library, Information and Media Studies.
Non Deterministic Automata
Finite Automata.
XML Query Processing Yaw-Huei Chen
2/18/2019.
XML indexing – A(k) indices
Incremental Maintenance of XML Structural Indexes
Segment Tree and Its Usage for geometric Computations
Non Deterministic Automata
Balanced search trees: trees.
Presentation transcript:

1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science & Eng, UCSD

Incremental Validation of XML Databases: XML Database Document Type Definition (DTD) XML Schema/ XQuery Type System Updates O(log n) O(log 2 n) n nodes

XML As Labeled Ordered Trees cars usednew car yearmodelyearmodel 92 Civic 96 Acura model CivicMaxima year 03

Document Type Definitions (DTDs): Abstraction & Example cars usednew car yearmodelyearmodel root : cars cars used new used car* new car* car (year|) model car modelyear 92 Civic 96 AcuraCivicMaxima 03 dummy

Tree Satisfying DTD, General Case 1 2 i i-1 i+1 k-1 k … … … 1 2 k-1 k … … abc root : … r … r

XML Schemas/XQuery Types as Specialized DTDs cars usednew car yearmodelyearmodel root : cars T cars T used T new T used T car U * new T car N * car U year T model T car N (year T |) model T car modelyear used T new T cars T car U car N car U,car N model T year T model T year T LABEL TYPES car {car U, car N } cars {cars T } used {used T } …

Tree Automata Specialized DTDs cars usednew car yearmodelyearmodel car modelyear used T new T cars T car U, car N car U, car N car U, car N model T year T model T year T

Incremental Validation Problem Statement For each valid tree T use an auxiliary structure A(T) so that, given a series of update commands efficiently decide if the updated tree T is valid efficiently update A(T) and T

Types of Updates: Node Renaming u(v, ) 1 2 i i-1 i+1 k-1 k … … … r 1 2 k-1 k … … abc v

Types of Updates: Deletion d(v) 1 2 i-1 i+1 k-1 k … … … r … abc i 1 2 k-1 k … v

Types of Updates: Insertion 1 2 i-1 i+1 k-1 k … … … r … abc v i+1 i insert_after(v i-1, i ) v i-1

Validating a Renaming u(i, ) on a Regular String of N : Take One 1 2 i i-1 i+1 n-1 n … N … Validation of one update in O(1) given precomputed Pre and Post Post(i+1) Pre(i-1) u(i, ) requires recomputation of Pre(i), Pre(i+1), … and of Post(i), Post(i-1), … q0q0 1 2 i-1 … qFqF n n-1 i+1 … q0q0 1 2 i-1 …

Transition Relation Definition 1 2 i j n-1 n … ……… m T i,j = { (q, q) | } i+1 q i … q j m+1 T i,j = T i,m T m+1,j

Transition Relation Trees T 5,8 T 1,4 T 3,4 T 1,2 T 5,6 T 7,8 T 1,1 T 2,2 T 3,3 T 4,4 T 5,5 T 6,6 T 7,7 T 8,8 T 1,8

Maintenance of the Structure and Validation in O(log n) T 1,1 T 2,2 T 3,3 T 4,4 T 5,5 T 6,6 T 7,7 T 8,8 T 1,2 T 3,4 T 5,6 T 7,8 T 5,8 T 1,4 T 1,8 u(6, ) If (q 0, q F ) then valid T 6,6 T 5,6 T 5,8 T 1,8

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions T 1 T 2 T 3 T 5 T 6 T 7 T 9 Ta Tb TcTa Tb Tc T a = T 1 T 2 If (q 0, q F ) T a T b T c then valid

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions T 1 T 2 T 3 T 5 T 6 T 7 T 8 T 9 T a T b T c

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions T 1 T 2 T 7 T 8 T 9 T a T b T c T 3 T 5 T 6

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions T3 T4T3 T4 T 5 T T 1 T 2 T 7 T 8 T 9 T a T b T c

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions Ta TdTa Td T e T c T3 T4T3 T4 T 5 T T 1 T 2 T 7 T 8 T 9 T f T g

Auxiliary Structures for Incremental DTD Validation 1 2 i i-1 i+1 k-1 k … … … r 1 2 k-1 k … … vivi u(v i, ) r i … … r r

Specialized DTD Incremental Validation: Take One a1a1 aiai a i-1 a i+1 akak … … r b1b1 b k-1 bkbk … … vivi u(v i, ) … types(v i )= { i,1,…, i,n } types() types(v i )= { i,1,…, i,n } types()

Inefficient for Deep Trees: Apply Divide- And-Conquer in Vertical Direction … … Turn Specialized DTD into NFA that validates a vertical line Fuse vertical and horizontal directions using binary tree and split work in both

Tree Satisfying Specialized DTD transformed into Binary Tree Accepted By Tree Automaton a b c dj k e fh gi a b c dj k e fh g i # # # # ## # # # # ##

Designate Lines in Binary Trees Size( ) > 2 Size( ) Size( ) > 4 Size( )

Example Line Structure a b c dj k e fh g i # # # # ## # # # # ## a c d b # f # j e k # h g i # # # # # # # # #

From Tree Automaton to Validating Lines with NFA a c b j e k h g i d f d

a c b, T c j e k h g i d, T j f, T g

Incremental Validation of the Line Structure in O(log 2 |T|) a c b, T c j e k h g i f, T g m d, T j Insert m after k #updated lines < 1 + log |T| Cost of line update O(log |T|)

Validating Insertions and Deletions: the Non-Line-Preserving Case Insertion

Key Complexity Results Given m updates on tree of size n, incrementally validate DTD in O(m log n) given alphabet, size of maximum regular expression d: O(m | | d 2 log d log n) Data structure of size O(d 2 n) Specialized DTDs in O(m log 2 n) given set of types O(m | | 2 d 2 (log d + log | |) log 2 n) Data structure of size O(| | 2 d 2 log 2 n) Lower complexity for 1-unambiguous

Ongoing and Future Work (with Andrey Balmin) Incorporate Transition Relation Trees in B-Tree Structure Exploit locality Experimental evaluation on set of 65 DTDs: In 96% of type definitions an update may only affect transition relations of length<4 Common case much more efficient than worse case Detect the property and employ algorithms that do not build trts in such cases Optimization over multiple updates More complex updates & edit operations