Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

XML: Extensible Markup Language
Complexity Classes: P and NP
Covering Indexes for XML Queries by Prakash Ramanan
Semantics Static semantics Dynamic semantics attribute grammars
4b Lexical analysis Finite Automata
Lecture 24 MAS 714 Hartmut Klauck
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
© The McGraw-Hill Companies, Inc., Chapter 8 The Theory of NP-Completeness.
January 5, 2015CS21 Lecture 11 CS21 Decidability and Tractability Lecture 1 January 5, 2015.
Complexity Theory CSE 331 Section 2 James Daly. Reminders Project 4 is out Due Friday Dynamic programming project Homework 6 is out Due next week (on.
The Theory of NP-Completeness
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Containment and Equivalence for an XPath Fragment Authors:Gerome Miklau Dan Suciu Presented by: Shnaiderman Lila.
Inbal Yahav A Framework for Using Materialized XPath Views in XML Query Processing VLDB ‘04 DB Seminar, Spring 2005 By: Andrey Balmin Fatma Ozcan Kevin.
The Complexity of XPath Evaluation Paper By: Georg Gottlob Cristoph Koch Reinhard Pichler Presented By: Royi Ronen.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Hardness Results for Problems
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
Introduction to XPath Bun Yue Professor, CS/CIS UHCL.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.
XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.
Introduction Chapter 0. Three Central Areas 1.Automata 2.Computability 3.Complexity.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
XML Typing and Query Evaluation. Plan We will put some formal model underlying XML Trees and queries on them – Keeping in mind the practical aspects but.
INTRODUCTION TO THE THEORY OF COMPUTATION INTRODUCTION MICHAEL SIPSER, SECOND EDITION 1.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
February 18, 2015CS21 Lecture 181 CS21 Decidability and Tractability Lecture 18 February 18, 2015.
NP Complexity By Mussie Araya. What is NP Complexity? Formal Definition: NP is the set of decision problems solvable in polynomial time by a non- deterministic.
XML Data Management 10. Deterministic DTDs and Schemas Werner Nutt.
Managing XML and Semistructured Data Lecture 13: XDuce and Regular Tree Languages Prof. Dan Suciu Spring 2001.
CSCI 3160 Design and Analysis of Algorithms Tutorial 10 Chengyu Lin.
MA/CSSE 474 Theory of Computation Decision Problems DFSMs.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Theory of Computation, Feodor F. Dragan, Kent State University 1 TheoryofComputation Spring, 2015 (Feodor F. Dragan) Department of Computer Science Kent.
1 First order theories (Chapter 1, Sections 1.4 – 1.5) From the slides for the book “Decision procedures” by D.Kroening and O.Strichman.
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
CS6045: Advanced Algorithms NP Completeness. NP-Completeness Some problems are intractable: as they grow large, we are unable to solve them in reasonable.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
NPC.
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
1 XPath. 2 Agenda XPath Introduction XPath Nodes XPath Syntax XPath Operators XPath Q&A.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Tree Automata First: A reminder on Automata on words Typing semistructured data.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
1 Finite Automata. 2 Introductory Example An automaton that accepts all legal Pascal identifiers: Letter Digit Letter or Digit "yes" "no" 2.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
NP-Completeness A problem is NP-complete if: It is in NP
The NP class. NP-completeness
P & NP.
Lexical analysis Finite Automata
Two issues in lexical analysis
Jaya Krishna, M.Tech, Assistant Professor
RAJALAKSHMI ENGINEERING COLLEGE
Managing XML and Semistructured Data
On Inferring K Optimum Transformations of XML Document from Update Script to DTD Nobutaka Suzuki Graduate School of Library, Information and Media Studies.
4b Lexical analysis Finite Automata
From: String Generation for Testing Regular Expressions
4b Lexical analysis Finite Automata
Instructor: Aaron Roth
P, NP and NP-Complete Problems
Umans Complexity Theory Lectures
P, NP and NP-Complete Problems
Instructor: Aaron Roth
Presentation transcript:

Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas

SEMINAR OBJECTIVES PRSENTING THE PROBLEM OF NON POLYNOMIAL COMPLEXITY FOR CONTAINMENT AND EQUIVALENCE OF XPath FRAGMENTS. PRSENTING THE PROBLEM OF NON POLYNOMIAL COMPLEXITY FOR CONTAINMENT AND EQUIVALENCE OF XPath FRAGMENTS. PRESENTING TWO ALGORITHMS THAT IMPROVE THE COST OF XPATH CONTAINMENT AND EQUIVALENCE PROBLEM. PRESENTING TWO ALGORITHMS THAT IMPROVE THE COST OF XPATH CONTAINMENT AND EQUIVALENCE PROBLEM. PRESENTING TREE PATTERNS AS AN EFFECTIVE TOOL FOR PROVING IN XPATH FRAGMENTS. PRESENTING TREE PATTERNS AS AN EFFECTIVE TOOL FOR PROVING IN XPATH FRAGMENTS.

SO WHAT IS XPath? A simple language for navigating XML documents and selecting a set of nodes A simple language for navigating XML documents and selecting a set of nodes With XPATH we can query XML data, describe key constraints, express transformations and reference elements in remote documents. With XPATH we can query XML data, describe key constraints, express transformations and reference elements in remote documents. We can find XPath influence in other XML query languages and features such as XQuery, XSLT, XML schema, XLink, XPointer and more... We can find XPath influence in other XML query languages and features such as XQuery, XSLT, XML schema, XLink, XPointer and more...

DEFINTIONS Simple XPath fragment. Simple XPath fragment. Containment between two XPath fragments. Containment between two XPath fragments. Equivalence between two XPath fragments. Equivalence between two XPath fragments. Computability definitions. Computability definitions. Tree patterns as a proving tool for XPath fragments. Tree patterns as a proving tool for XPath fragments.

Simple XPath fragment An XPath statement. An XPath statement. Contains three most important features for navigating: Contains three most important features for navigating: –Child and descendant axis. “//” “/” –Wildcards. “*” –Qualifiers. “[]” We disregard attributes, conditions... We disregard attributes, conditions... We identify and compare nodes only by their label. We identify and compare nodes only by their label. We disregard order completely. We disregard order completely. Example: a//*[b//d][c] Example: a//*[b//d][c]

Simple XPath fragment Are these all the features we have in XPath??? Are these all the features we have in XPath??? Are these all the features we need for representing navigation in XML documents ? Are these all the features we need for representing navigation in XML documents ? NO!!!!! YES!!!!! At least these are the needed ones for the proof of this article.

Containment The meaning of Containment between two XPath’s fragments A and B is that for every XML document the result of applying XPath A will be contained in the result of applying XPath B. The meaning of Containment between two XPath’s fragments A and B is that for every XML document the result of applying XPath A will be contained in the result of applying XPath B. Result is stated as a Set of nodes and does not consider order. Result is stated as a Set of nodes and does not consider order. Can we apply this containment on the entire XML documents world?? Can we apply this containment on the entire XML documents world?? Is there another way to determine containment between two XPath fragments??? Is there another way to determine containment between two XPath fragments???

Equivalence The meaning of Equivalence between two XPath fragments A and B is that for every XML document the result of applying XPath A will equal to the result of applying XPath B. The meaning of Equivalence between two XPath fragments A and B is that for every XML document the result of applying XPath A will equal to the result of applying XPath B. The problem of Equivalence can be reduced to the problem of Containment The problem of Equivalence can be reduced to the problem of Containment –Equivalence = containment in both ways between patterns. –Containment can be computed with an algorithm that computes equivalence and runs in polynomial time. From now we will mention only the problem of containment and the results will be valid as well for equivalence. From now we will mention only the problem of containment and the results will be valid as well for equivalence.

Computability Definitions NP - stands for “Nondeterministic-Polynomial". NP - stands for “Nondeterministic-Polynomial". P class - A class of mathematical problems for which an efficient solution has been found, which is solvable in polynomial time. P class - A class of mathematical problems for which an efficient solution has been found, which is solvable in polynomial time. NP class - A class of mathematical problems which most likely has Exponential Complexity, for which no efficient solution has been found (yet), which is not solvable in polynomial time. NP class - A class of mathematical problems which most likely has Exponential Complexity, for which no efficient solution has been found (yet), which is not solvable in polynomial time.Exponential ComplexityExponential Complexity NP hard problem - a problem that can be reduced from each NP problem ( even worst than NP… ). NP hard problem - a problem that can be reduced from each NP problem ( even worst than NP… ). NP complete problem – a problem which belongs to the NP class of problems and is a NP hard problem by itself. NP complete problem – a problem which belongs to the NP class of problems and is a NP hard problem by itself.

Tree Patterns An unordered tree over the alphabet of the XPath. An unordered tree over the alphabet of the XPath. XPath nodes are marked as nodes in the tree pattern. XPath nodes are marked as nodes in the tree pattern. Child axis are marked as edges. Child axis are marked as edges. Descendant are marked as edges with double lines. Descendant are marked as edges with double lines. K-tuple of nodes called the result type. K-tuple of nodes called the result type. For a tree pattern P The arity of the result tuple is called the of arity of P. For a tree pattern P The arity of the result tuple is called the of arity of P. Pattern tree P is Boolean iff its arity is 0. Pattern tree P is Boolean iff its arity is 0.

Tree Patterns Tree patterns are more elegant and general than XPath fragments. Tree patterns are more elegant and general than XPath fragments. We can reduce from XPath to Tree Patterns and via versa quite easily. We can reduce from XPath to Tree Patterns and via versa quite easily. Now we can prove attributes using the graph theory.

Tree Pattern - example For the Xpath expression : For the Xpath expression : –a//*[b//d][c] will be the next tree * d b root wildcard descendan t child a c

Usage of Tree Patterns for navigating in XML trees Embedding from Tree pattern to XML tree. Embedding from Tree pattern to XML tree. Imagine it as a function that must: Imagine it as a function that must: –preserve root. –Respects node labels. –Respects edge relationships. After embedding return the information from the nodes marked as return nodes and down. After embedding return the information from the nodes marked as return nodes and down. For Boolean Patterns return true if such an embedding exists. For Boolean Patterns return true if such an embedding exists.

Example for embedding a * d cb a s t cb d

PROBLEM…. Testing Containment between two XPath fragments is a NP complete problem. Testing Containment between two XPath fragments is a NP complete problem. Can be proven by a reduction from the 3CNF Co-NP class to our class. Can be proven by a reduction from the 3CNF Co-NP class to our class.

Do We really care about it??? In almost all the applications we described so far. In almost all the applications we described so far. Inference of keys. Inference of keys. Optimization of XPath queries. Optimization of XPath queries. When do we need to test for containment or equivalence between fragments?

Solving the problem Finding an algorithm that will be both efficient and complete for this problem is quite difficult ( like proving P = NP ). Finding an algorithm that will be both efficient and complete for this problem is quite difficult ( like proving P = NP ). Finding an algorithm which is efficient but not complete. Finding an algorithm which is efficient but not complete. Finding an algorithm that is complete but not always efficient. Finding an algorithm that is complete but not always efficient.

First solution : Pattern homomorphism

Pattern Homomorphisms - definition An homomorphism h between two tree patterns p,p’ is a function h:Nodes(p) -> Nodes(p’) that maintains the following conditions: An homomorphism h between two tree patterns p,p’ is a function h:Nodes(p) -> Nodes(p’) that maintains the following conditions: –Root preserving. –For each x in p h(x) in p’ is x or *. –Child and descendant relations preserving. Finding weather a homomorphism between two patterns exist has many efficient algorithms. Finding weather a homomorphism between two patterns exist has many efficient algorithms. The algorithm is sound. Whenever there exists homomorphism between tree patterns p and p’ than p  p. The algorithm is sound. Whenever there exists homomorphism between tree patterns p and p’ than p  p. The existence of homomorphism is always a sufficient condition for containment. The existence of homomorphism is always a sufficient condition for containment. But is it a necessary condition? But is it a necessary condition?

Example for homomorphism a b a c * h(a) = a h(b) = *

Homomorphism is not a complete solution for containment A Homomorphism between the two tree patterns does not exist even though they are equivalent. A Homomorphism between the two tree patterns does not exist even though they are equivalent. a b * a b *

Cases where homomorphism applies Fragments contain only *,[] Fragments contain only *,[] Fragments contain only //,[] Fragments contain only //,[] Fragments that contain all three but can be translated to an expression that belongs to one of the above without changing the semantic. Fragments that contain all three but can be translated to an expression that belongs to one of the above without changing the semantic.

Conclusion for homomorphism Sound. Sound. Efficient. Efficient. Incomplete. Incomplete. Now we aim searching over an algorithm which will be sound and complete and may be efficient in several cases.

ALGORITHM FOR CONTAINMENT

Containment between regular languages Reducing the problem of containment between two XPath fragments to containment between two regular languages by translating from Tree Pattern to an automata. Reducing the problem of containment between two XPath fragments to containment between two regular languages by translating from Tree Pattern to an automata. The algorithm is complete, with defined rules we can translate completely from automata to Tree Pattern and via versa. The algorithm is complete, with defined rules we can translate completely from automata to Tree Pattern and via versa.

Automata for XPath fragment Defined on ranked trees. Defined on ranked trees. Bottom up structure. Bottom up structure. Only the root is an accepting state. Only the root is an accepting state. The initial states are the leaves of the tree. The initial states are the leaves of the tree. The transitions are of the form:(q1,q2,…,qn;a) -> q The transitions are of the form:(q1,q2,…,qn;a) -> q

definitions FTA - finite tree automata, an automata that contains set of states and transitions of the form described. FTA - finite tree automata, an automata that contains set of states and transitions of the form described. FTA can be deterministic - DFTA. FTA can be deterministic - DFTA. Each FTA A with Q states can be translated to a DFTA B with maximum of  Q states. Each FTA A with Q states can be translated to a DFTA B with maximum of  Q states. AFTA - alternating finite tree automaton extends the definition of FTA by adding “AND transitions” of the form  (q1,q2,…,qm)->qi. AFTA - alternating finite tree automaton extends the definition of FTA by adding “AND transitions” of the form  (q1,q2,…,qm)->qi. A DFTA can be built as well for AFTA without increasing the cost of determinisiting the automata. A DFTA can be built as well for AFTA without increasing the cost of determinisiting the automata.

The entire algorithm Construct the DFTA A accepting the “regular expressions of P” Construct the DFTA A accepting the “regular expressions of P” Construct the AFTA A’ accepting the regular expressions of P’ ” Construct the AFTA A’ accepting the regular expressions of P’ ” Compute the AFTA B=A x A’ Compute the AFTA B=A x A’ compute the DFTA C=Det(B) compute the DFTA C=Det(B) if lang(A)  lang(C) the return true else return false. if lang(A)  lang(C) the return true else return false.

r a *b ab b r a b*  ?

Step 1:Building FTA A from Tree pattern p States(A) = Nodes(p). States(A) = Nodes(p). For each node x with children x1,…,xk we add a transition (x1,x2,…;x) -> x For each node x with children x1,…,xk we add a transition (x1,x2,…;x) -> x For each descendant edge e from node x to node y we add (y;e)->x. For each descendant edge e from node x to node y we add (y;e)->x. we add internal circle (y,*) -> y we add internal circle (y,*) -> y The terminal state will be only the root. The terminal state will be only the root.

Example for building FTA r a *b ab b r a b * a b b

Step 2:Building an AFTA A’ from pattern p’ States(A’) = Nodes(p’)  Edges(p’) States(A’) = Nodes(p’)  Edges(p’) (q,a) -> for every symbol a that has out coming edge e. if it is a descendant relationship than we also add an internal circle to the source node. (q,a) -> for every symbol a that has out coming edge e. if it is a descendant relationship than we also add an internal circle to the source node.  (e1,e2,e3..) -> a for every a that has incoming edges.  (e1,e2,e3..) -> a for every a that has incoming edges.

Example for building AFTA for pattern p’ r a b* b* a r

Conclusion for the containment algorithm Sound Sound Complete. Complete. Not always efficient. Not always efficient.