Exchange Intensional XML Data Tova MiloSerge Abiteboul Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ; Bernd AmannOmar Benjelloun Bernd Amann Cedric-CNAM.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

Automata Theory Part 1: Introduction & NFA November 2002.
XML: Extensible Markup Language
XDuce Tabuchi Naoshi, M1, Yonelab.
CS 267: Automated Verification Lecture 8: Automata Theoretic Model Checking Instructor: Tevfik Bultan.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Greedy Algorithms Greed is good. (Some of the time)
Determinization of Büchi Automata
Graphs Graphs are the most general data structures we will study in this course. A graph is a more general version of connected nodes than the tree. Both.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture15: Reductions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Serge Abiteboul Omar Benjelloun Bogdan Cautis Ioana Manolescu Tova Milo Nicoleta Preda Lazy Query Evaluation for Active XML.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Introduction to Computability Theory
61 Nondeterminism and Nodeterministic Automata. 62 The computational machine models that we learned in the class are deterministic in the sense that the.
Firewall Policy Queries Author: Alex X. Liu, Mohamed G. Gouda Publisher: IEEE Transaction on Parallel and Distributed Systems 2009 Presenter: Chen-Yu Chang.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
NP-Complete Problems Problems in Computer Science are classified into
CHAPTER 4 Decidability Contents Decidable Languages
Aho-Corasick String Matching An Efficient String Matching.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Omar Benjelloun – Active XML Active XML: A data-centric perspective on Web services Omar Benjelloun INRIA Futurs With: Serge Abiteboul, Tova Milo, and.
Data Flow Analysis Compiler Design Nov. 8, 2005.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
Manohar – Why XML is Required Problem: We want to save the data and retrieve it further or to transfer over the network. This.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Finding Optimal Probabilistic Generators for XML Collections Serge Abiteboul, Yael Amsterdamer, Daniel Deutch, Tova Milo, Pierre Senellart BDA 2011.
DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.
Lecture 22 XML querying. 2 Example 31.5 – XQuery FLWOR Expressions ‘=’ operator is a general comparison operator. XQuery also defines value comparison.
1 Automatic Refinement and Vacuity Detection for Symbolic Trajectory Evaluation Orna Grumberg Technion Haifa, Israel Joint work with Rachel Tzoref.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
XML Data Management 10. Deterministic DTDs and Schemas Werner Nutt.
Learning Automata and Grammars Peter Černo.  The problem of learning or inferring automata and grammars has been studied for decades and has connections.
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
Managing XML and Semistructured Data Lecture 13: XDuce and Regular Tree Languages Prof. Dan Suciu Spring 2001.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Database Systems Part VII: XML Querying Software School of Hunan University
Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on: Global Computing (GC) Proactive.
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
Recognizing safety and liveness Presented by Qian Huang.
1 Typing XQuery WANG Zhen (Selina) Something about the Internship Group Name: PROTHEO, Inria, France Research: Rewriting and strategies, Constraints,
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
INRIA - Progress report DBGlobe meeting - Athens November 29 th, 2002.
XSD: XML Schema Language Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata,
Chapter 13 Backtracking Introduction The 3-coloring problem
1 Omar Benjelloun - New Bases for New Data New Bases for New Data Omar Benjelloun Stanford University January 27th, 2006.
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
4 - Conditional Control Structures CHAPTER 4. Introduction A Program is usually not limited to a linear sequence of instructions. In real life, a programme.
1 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu, Milo, Preda.
Theory of Computation Automata Theory Dr. Ayman Srour.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
Finite State Machines Dr K R Bond 2009
Copyright © Cengage Learning. All rights reserved.
Pushdown Automata.
Two issues in lexical analysis
Chapter 9 Web Services: JAX-RPC, WSDL, XML Schema, and SOAP
ICS 353: Design and Analysis of Algorithms
On Inferring K Optimum Transformations of XML Document from Update Script to DTD Nobutaka Suzuki Graduate School of Library, Information and Media Studies.
XML indexing – A(k) indices
Presentation transcript:

Exchange Intensional XML Data Tova MiloSerge Abiteboul Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ; Bernd AmannOmar Benjelloun Bernd Amann Cedric-CNAM ; Omar Benjelloun INRIA ; Fred Dang Ngoc Fred Dang Ngoc INRIA

Outline  Introduction  The Model and The Problem  Exchanging Intensional Data  Safe Rewriting  Possible Rewriting  Implementation  Conclusion and Related Work

Introduction  What are intensional documents? XML document where;  some of defined explicitly  some of the documents are defined explicitly defined by programs (i.e Web services  some are defined by programs (i.e Web services) that generate data.  Materialisation of the programs the process of evaluating some of the programs included in an XML document and replacing them by their results.

Introduction (cont’d)  The goals of the paper Study the new issues raised by the exchange of intensional XML document btw. Applications Study the new issues raised by the exchange of intensional XML document btw. Applications Decide on which data should be materialised before it is sent and which should not Decide on which data should be materialised before it is sent and which should not

Introduction (cont’d) Sender capabilities ACL cost... Receiver capabilities ACL cost... Data Exchange Schema g qf f qg... g q r g f r q g r g q οData exchange scenario for intensional documents g r

Outline  Introduction  The Model and The Problem  Exchanging Intensional Data  Safe Rewriting  Possible Rewriting  Implementation  Conclusion and Related Work

The Model and The Problem  Simple intensional XML Model Intension document Simple schema Instance of a schema About rewritings  A Richer Data Model Function pattern Restricted Service Invocations

The Model and The Problem Simple intensional XML  Model intentional XML documents as Labelled Trees consisting of two types of nodes:  Data nodes: Nodes with a label in L U D  Function Nodes  correspond to “Service Calls”, that is, nodes with a label in F: The children subtrees of a function node are the Function Parameters When the function is called:  These subtrees are passed to it  The return value replaces the function node in the document. Assume the existance of some Disjoint Domains:  N : domain of NODES  L : domain of LABELS  F : domain of FUNCTION NAMES  D : domain of DATA VALUES

newspaper title “The Sun” date “04/10/2002” Get_Temp city “Paris” TimeOut “Exhibits” temp “16 ºC” The Model and The Problem Simple intensional XML (cont ’ d)  An example of intentional XML documents

 Simple schema A document schema s is an expression (L,F,τ) where,  L L :finite set of labels  F F :finite set of function names  τ :function that maps: Each label name l Є L to a expression over L U F or to the keyword “data” Each function name f Є F to a pair of expressions called  τin( f )  input type of f  τout( f )  output type of f The Model and The Problem Simple intensional XML (cont ’ d)

 An Example of a Schema: data:  τ (newspaper) =title.date.(Get_Temp|temp).(TimeOut|exhibit)  τ (title) = data  τ (date) = data  τ (temp) = data  τ (city) = data  τ (exhibit) = data Functions:  τin (Get_Temp)= city  τout (Get_Temp)= temp  τin (TimeOut)= data  τout (Timeout)= (exhibit|performance)  τin (Get_Date)= title  τin (Get_Date)= date The Model and The Problem Simple intensional XML (cont ’ d)

 Instances of a schema An intensional document t is instance of a schema s=(L,F,τ) if for each:  Data Node n Є t with label l Є L, the labels of n’s children form a word in lang( τ ( l ))  Same is valid for Function Node. τ ( Used to denode the regular language defined by τ (l )

 about Rewritings  t,t’: trees  IF t’ is obtained from t by; selecting a function node v in t with some label f and replacing it by an arbitrary output instance of f  THEN we say that t t’ The Model and The Problem Simple intensional XML (cont ’ d) v

 about Rewritings (cont’d) IF t t 1 t t n THEN we say that t t n nodes v 1, , v n are called rewriting sequence the set of all trees t’ such that t t’ is denoted ext(t). the set of all trees t’ such that t t’ is denoted ext(t). v1v1v1v1 v2v2v2v2 vnvnvnvn * t rewrites into t n *

The Model and The Problem Simple intensional XML (cont ’ d) about Rewritings (cont’d)  Let: t be a tree s be a schema  1. IF ext(t) contains some instance of s THEN t possibly rewrites into s.  2. IF either t is already an instance of s or there exists some node v in t such that all trees t’ where t t’ safely rewrite into s THEN we say that t safely rewrites into s v

The Model and The Problem Simple intensional XML (cont ’ d) safely rewriting of schema safely rewriting of schema  Let: s be a schema s be a schema r is a distinguished label called root label r is a distinguished label called root label  IF all the instances t of s with root label r rewrite safely into instances of s’ THENwe say that: s safely rewrites into s’ THENwe say that: s safely rewrites into s’Problems:

The Model and The Problem Simple intensional XML (cont ’ d) Sender capabilities ACL cost... Receiver capabilities ACL cost... Data Exchange Schema g qf f qg... g q r g f r q g r g q g r

The Model and The Problem A Richer Data Model Function Patterns  A function belongs to the pattern if its name satisfies the boolean predicate and its signature is the same as the required one  EX: τ name (Forecast)= UDDIF InACL τ name (Forecast)= UDDIF InACL τ in (Forecast)= city τ in (Forecast)= city τ out (Forecast)= temp τ out (Forecast)= temp

The Model and The Problem A Richer Data Model (cont ’ d)  Restricted Service Invocations We assumed so far that all the functions appearing in a document may be invoked in a rewriting, in order to match a given schema. We assumed so far that all the functions appearing in a document may be invoked in a rewriting, in order to match a given schema. This is not always the case, for the reasons like; This is not always the case, for the reasons like;  security,  cost,  access rights, etc. THUS, function names/patterns in the schema can be partitioned into two disjoint groups of invocable and noninvocable ones. THUS, function names/patterns in the schema can be partitioned into two disjoint groups of invocable and noninvocable ones. A legal rewriting is then one that invokes only invocable functions. A legal rewriting is then one that invokes only invocable functions.

Outline  Introduction  The Model and The Problem  Exchanging Intensional Data  Safe Rewriting  Possible Rewriting  Schema Rewriting  Implementation  Conclusion and Related Work

Exchanging Intensional Data  Rewriting process Safe writing Possible writing Mix approach  Restriction

Exchanging Intensional Data rewriting process  Safe rewriting: check if t safely rewrites to s  if so, find a rewriting sequence.  rewriting sequence  a sequence of functions that need to be invoked to transform t into the required structure  preferred required structure  shortest/ cheapest one

Exchanging Intensional Data rewriting process(cont ’ d)  Possible Rewriting : IF a safe rewriting does not exist  check whether at least t may rewrite to s.  IF it is acceptable to do so (the sender accepts that the rewriting may fail),  try to find a successful rewriting sequence if one exists  preferred rewriting sequence  one with the least cost.

Exchanging Intensional Data rewriting process(cont ’ d)  Mixed Approached: In mixed approach, one could first invoke some function calls then attempt from there to find safe rewritings.

Exchanging Intensional Data rewriting process(cont ’ d) K-depth rewriting sequence K-depth rewriting sequence  For a rewriting sequence t v : t 1.. t n, IF the node V j was returned by the invocation of the function V i, V j  t j, V i  t j-1 IF the node V j was returned by the invocation of the function V i, V j  t j, V i  t j-1 THEN we say that function node V j depends on a function node V i. THEN we say that function node V j depends on a function node V i. IF the dependency graph among the nodes contains no paths of length greater than k. IF the dependency graph among the nodes contains no paths of length greater than k. THEN we say that a rewriting sequence is of depth k THEN we say that a rewriting sequence is of depth k v1v1 vnvn

Exchanging Intensional Data Restriction RESTRICTION: onsider only k-depth left-to-right rewritings. “Consider only k-depth left-to-right rewritings.“

Outline  Introduction  The Model and The Problem  Exchanging Intensional Data  Safe Rewriting  Possible Rewriting  Schema Rewriting  Implementation  Conclusion and Related Work

Safe Rewriting(DEC16,2004)  Algorithm for k-depth left to right safe rewriting  Safe Rewriting Algorithm: Given: Given:  word w  the output types R f1,.....,R fn of the available functions  target regular language R Purpose of the algorithm: Purpose of the algorithm:  to test if w can be safely rewritten into a word in R  if so, to find a safe rewriting sequence

Safe Rewriting (cont’d)  Note:For illustration purposes we use the newspaper document w=title.date.Get_Temp.TimeOut  word children labels form w=title.date.Get_Temp.TimeOut  word children labels form R=title.date.temp (TimeOut|exhibit * )  safe rewriting of the above word into the word in R R=title.date.temp (TimeOut|exhibit * )  safe rewriting of the above word into the word in R  The Algorithm: Main idea: to put things in regular language terms, the intersection of the language generated by the k-depth invocation with the complement of the target language R should be Empty.

Safe Rewriting (cont’d) 1.Build the finite state automata for the following regular languages (1) w=title.date.Get_Temp.TimeOut (1) A w w=title.date.Get_Temp.TimeOut (2) Build automata A fi each accepting the regular language R fi (the output types of the available functions). q1 date q0 title q2 Get_Temp q3 TimeOut q4

Safe Rewriting (cont’d) (3) Build an automaton A accepting the complement of the regular language R. The automaton should be deterministic and complete. τ’(newspaper)=title.date.temp(TimeOut|exhibit*) The complement automation A for schema τ’(newspaper)=title.date.temp(TimeOut|exhibit*) p5 p2p2 p3p4 p6 tempTimeOut exhibit * * * * * p1 date p0 title *

Safe Rewriting (cont’d) 2. Construct automation represents all the words that can be generated by such k-depth rewriting process (by iteration) 2. Construct automation A w represents all the words that can be generated by such k-depth rewriting process (by iteration) w=title.date.Get_Temp.TimeOut  1 depth automaton A w for the word w=title.date.Get_Temp.TimeOut 1 q1 date q0 title q2 Get_Temp q3 TimeOut q4 q5 ε q6 ε temp q7 εε exhibit performance Fork node Represents choice of invoking the function Represents choice of not invoking the function k

Safe Rewriting (cont’d) 3.Construct the cartesian product automaton 3.Construct the cartesian product automaton AX=Aw X A k q0,p0 q3,p6 q1,p1q2,p2 q3,p3 q5,p2q6,p3 q4,p4 q7,p3q4,p3 q7,p5 q5,p5 q7,p6 q4,p6 q7,p6 title date Get_Temp temp TimeOut Perform. exhibit Performance exhibit TimeOut ε Exhibit Performance ε ε ε ε ε ε ε Figure6:

Safe Rewriting (cont’d) 4. Mark nodes in A X : q0,p0 q3,p6 q1,p1q2,p2 q3,p3 q5,p2q6,p3 q4,p4 q7,p3q4,p3 q7,p5 q5,p5 q7,p6 q4,p6 q7,p6 title date Get_Temp temp TimeOut Perform. exhibit Performance exhibit TimeOut ε Exhibit Performance ε ε ε ε ε ε ε Figure6:

Safe Rewriting (cont’d)  Try to obtain a SAFE REWRITING. “A safe rewriting exists IFF the initial state is not marked” “A safe rewriting exists IFF the initial state is not marked” Follow a non-marked path (corresponding to w ) starting from the initial state of A x to a state [q p] where q is an accepting state of A w Follow a non-marked path (corresponding to w ) starting from the initial state of A x to a state [q p] where q is an accepting state of A w  non-marked fork options on the path determine the rewriring choices (i.e. which functions to call)  when a function is invoked, we contnue the path with the new rewritten word rather than the word w k

Safe Rewriting (cont’d) To minimize the rewriting cost, choose a path with minimal number/cost of function invocations. To minimize the rewriting cost, choose a path with minimal number/cost of function invocations.  EXIT % End of the algorithm

Safe Rewriting (cont’d) τ’(newspaper)=title.date.temp.exhibit*  The complement automaton A for schema τ’(newspaper)=title.date.temp.exhibit* p5 q3 p3p4 p6 temp * exhibit * * * * * q1 date q0 title * Figure7:

Safe Rewriting (cont’d)  The cartesian product automaton A x = A w x A q0,p0 q3,p6 q1,p1q2,p2 q3,p3 q5,p2q6,p3 q7,p3q4,p3 q7,p5 q5,p5 q7,p6 q4,p6 q7,p6 title date Get_Temp temp TimeOut Perform. exhibit Performance exhibit TimeOut ε Exhibit Performance ε ε ε ε ε ε ε 11 Figure8:

Outline  Introduction  The Model and The Problem  Exchanging Intensional Data  Safe Rewriting  Possible Rewriting  Implementation  Conclusion and Related Work

Possible Rewriting  The Algorithm  1. Build finite state automaton for the following languages: 1.1. An automaton A w 1.1. An automaton A w 1.2. An automaton A accepting the regular language R 1.2. An automaton A accepting the regular language R k

Possible Rewriting(cont ’ d) τ’’(newspaper)=title.date. Temp.exhibit*  An automaton A for schema τ’’(newspaper)=title.date. Temp.exhibit* p2 p3p4 tempExhibit exhibit p1 date p0 title Figure10:

Possible Rewriting(cont ’ d)  2.Construct the cartesian product automaton A x =A w x A q0,p0 q1,p1q2,p2 q3,p3 q5,p2q6,p3 q7,p3 title date temp ε ε ε Figure11: q4,p3 q4,p4 q7,p4 ε ε exhibit k

Possible Rewriting(cont ’ d)  The cartesian product automaton for possible rewritting. q0,p0 q1,p1q2,p2 q3,p3 q5,p2q6,p3 q7,p3 title date temp ε ε ε Figure11: q4,p3 q4,p4 q7,p4 ε ε exhibit

Outline  Introduction  The Model and The Problem  Exchanging Intensional Data  Safe Rewriting  Possible Rewriting  Implementation  Conclusion and Related Work

Implementation  In the implementation; intensional XML document  a well-formed XML document  To distinguish intensional parts from the rest of the document; namespace is used.  namespace defined for function (service) calls.

Implementation (cont ’ d) newspaper title “The Sun” dat e “04/10/2002” Get_Temp city “Paris” TimeOut “Exhibits”

Implementation (cont ’ d) Namespace defined for function (service) calls Data nodes title and date 1.URL of the server Three attributes of the function nodes provide necessary information to call the SOAP Service 2.Method name 3.associated namespace

Implementation (cont ’ d) Function TimeOut 1.URL of the server 2.Method name 3.associated namespace

Implementation (cont ’ d)  Newspaper element with structure  Newspaper element with structure title.date.(Forecast|temp). (TimeOut|exhibit*)

Implementation (cont ’ d)  The Role of Schema Enforcement Module :  1.  1. to verify whether the call parameters conform to the WSDL int description of the service.  2  2. if not, try to rewrite them into the required structure.  3. if 2 fails, to report an error. NOTE:  Similarly, before an ActiveXML returns its answer, the Schema Enforcement Module performs the same three steps on the returned data.

Outline  Introduction  The Model and The Problem  Exchanging Intensional Data  Safe Rewriting  Possible Rewriting  Implementation  Conclusion and Related Work

CONCLUSION and RELATED WORK  XML documents with embedded calls to Web services are already present in several existing products.(ActivXML System) WHAT’S NEW ?  However, the proposed extension of the XML Schema with function types is a first step towards a more precise description of XML documents embedding computation. MAIN PROBLEM:  whether Safe Rewriting remains decidable when the k-depth restriction is removed.