Presentation is loading. Please wait.

Presentation is loading. Please wait.

Change-Centric Management of Versions in an XML Warehouse Amélie Marian Columbia University Serge Abiteboul, Grégory Cobéna, Laurent Mignet INRIA-Rocquencourt.

Similar presentations


Presentation on theme: "Change-Centric Management of Versions in an XML Warehouse Amélie Marian Columbia University Serge Abiteboul, Grégory Cobéna, Laurent Mignet INRIA-Rocquencourt."— Presentation transcript:

1 Change-Centric Management of Versions in an XML Warehouse Amélie Marian Columbia University Serge Abiteboul, Grégory Cobéna, Laurent Mignet INRIA-Rocquencourt

2 VLDB-Sept 2001Amélie Marian2 Overview The Xyleme Project Change Management Version Management –XIDs –XML Diff –Deltas –Storage of XML documents versions –Implementation and experiments

3 VLDB-Sept 2001Amélie Marian3 The Xyleme Project A dynamic XML Data Warehouse with high level services: –User-friendly Query Engine –Semantic Data Integration –Version Management –Query Subscription, Change Monitoring services Xyleme project is now finished Start-up also called Xyleme

4 VLDB-Sept 2001Amélie Marian4 Change Management Version Management Learning about Changes Monitoring Changes: Query Subscription Querying the Past:Temporal Queries

5 VLDB-Sept 2001Amélie Marian5 Version Management Our Requirements: Obtain the current version Get the modifications since time t Subscribe to change notifications, query changes Compute temporal queries Rebuild the version V i of a document at time t i

6 VLDB-Sept 2001Amélie Marian6 Getting the Documents XML documents are fetched from the web We only have snapshots of the documents Pr Catalog P Pr NPNNP Camera300TV100VCR200 Pr Catalog P Pr NPNNP TV100DVD500VCR150 Version 1 Version 2

7 VLDB-Sept 2001Amélie Marian7 XIDs Unique identifiers needed to track XML nodes through time: Track changes on a specific node (ex: a product in a catalog) Reconstruct the history of a node But physically adding an ID attribute to each node is expensive storage-wise  XIDs: allow to attach persistent IDs to every node in a storage efficient manner

8 VLDB-Sept 2001Amélie Marian8 XIDs XIDs stored separately as a list (XID-map) –List of the nodes IDs in a postorder traversal of the tree –XIDnext: gives the next available XID Compact Representation Document is not modified 13 3 12 15 14 12 7 89 1011 XID-map (1-3,14-15,7-13|16)

9 VLDB-Sept 2001Amélie Marian9 XML Diff We implemented a XML diff algorithm to compute changes between two versions of a document: –Use of XML structure for matching –Content matching Linear in the size of the document XML diff has two roles: –Match nodes –Build the delta Ongoing work on improving the XML diff

10 VLDB-Sept 2001Amélie Marian10 1412 15 Update 1113 10 97 86 16 Node Matching using a Diff Algorithm Delete Diff (V1,V2) delete(5) update(13,150) insert(16,2,(17-21)) New XID-map: (6-10,17-21,11-16|22) 3 4791214 51015 16 XID-map: (1-16|17) 6811131 2 Insert 18 21 20 1917 Pr Catalog P Pr NPNNP Camera300TV100VCR200 Pr Catalog P Pr NPNNP TV100DVD500VCR150 Version 1 Version 2

11 VLDB-Sept 2001Amélie Marian11 Edit-Scripts = SEQUENCE Sequences of basic operations over XML trees: Delete(n) Update(n, v) Insert(m,k,T) Move(n,k,m) An Edit Script can be applied to a document D if its operations are consistent with D An Edit Script applied to a document D will result in a unique document D ’ Several Edit Scripts applied to a document D can result in the same document D ’

12 VLDB-Sept 2001Amélie Marian12 Deltas (Δ) = SET We introduce an alternative way of representing changes: Deltas Δ i,j (unit delta) contains the Set of operations needed to go from V i to V j ( Diff(V i,V j ) ) A Delta (Δ) over a document D is the sequence of unit deltas over D: Δ={Δ 1,2,..., Δ k-1,k } There is a (almost) unique delta from V i to V j We represent Deltas as XML documents

13 VLDB-Sept 2001Amélie Marian13 Shortcomings of Deltas Storage Policies a) V 1, Δ 1,2, … Δ now-1,now b) Δ 2,1, … Δ now,now-1, V now c) V 1, Δ 2,1, … Δ now,now-1 d) Δ 1,2, … Δ now-1,now, V now Only a) and b) lossless But we would like to have fast access to: – V now –Δ i,now Deltas are not reversible and cannot be composed (information on position is missing)

14 VLDB-Sept 2001Amélie Marian14 Completed Deltas (Δ + ) Completed deltas contain more information : Delete(m,k,T) Update(n, ov, nv) Insert(m,k,T) Move(n,k,m,p,q) Completed Deltas can be reversed and composed Completed Deltas are in the spirit of some logs in DB systems

15 15 … Camera 300 DVD 500 Example of XML Δ+

16 VLDB-Sept 2001Amélie Marian16 Operations on Deltas Compute with version: –V i o Δ + i,j = V j –V i o Δ i,j = V j Reverse: (Δ + i,j ) -1 = Δ + j,i Compose: Δ + i,j ;Δ + j,k =Δ + i,k Simplify: Δ + i,j → Δ i,j

17 VLDB-Sept 2001Amélie Marian17 Storage of Versions For a document D (or a query result Q), we store: –Current Version: V k –XID-map (as text) of V k –Current Δ + = {Δ + 1,2,..., Δ + k-1,k } When a new version k+1 arrives: –Compute XML diff between k and k+1, compute Δ + k,k+1 –Replace current version: V k+1 –Replace XID-map –Append Δ + k,k+1 to Δ +

18 VLDB-Sept 2001Amélie Marian18 Levels of Versioning Full versioning is expensive, we support different levels of versioning: –Full Versioning: V now + Δ + –Partial Versioning: V now + Δ –Last Version Update: V now + Δ now-1,now –Change Support: V now + XML diff computed for Query Subscription –Not Versioned: V now

19 VLDB-Sept 2001Amélie Marian19 Implementation Version Manager and XML diff implemented in C++ A change simulator was implemented for tests A GUI was implemented

20 20 GUI Interface

21 VLDB-Sept 2001Amélie Marian21 Deltas Statistics Reasonable when there are not many modifications Relatively expensive for small documents Depends on the quality of the diff

22 VLDB-Sept 2001Amélie Marian22 Deltas Statistics (2) 30% of modifications on the document From left to right –Snapshots –Completed Deltas –Deltas: composition and previous version reconstruction are not possible –Composed Completed Deltas: advantages of Completed Deltas but coarser granularity and higher cost.

23 VLDB-Sept 2001Amélie Marian23 Conclusion Management of Versions based on Change Representation: –Representation in tree data (XML) –Study of storage policies –Implementation of running prototypes Completed Deltas: a Set of Modifications –Mathematical properties on completed deltas (algebraic group) Current work on Query Subscription, Continuous Queries and Changes over Collections of Documents

24 VLDB-Sept 2001Amélie Marian24 References Version Management –Chien, Tsotras and Zaniolo. Efficient Management of Multiversion Documents by Object Referencing. VLDB 2001. –Chawathe, Abiteboul and Widom. Managing Historical Semistructured Data. TAPOS 1999. –Cellary and Jomier. Consistency of Versions in Object-Oriented Databases. VLDB 1990. –Adiba and Lindsay. Database Snapshots. VLDB 1980. Diff Algorithms –Chawathe and Garcia-Molina. Meaningful Change Detection in Structured Data. Sigmod 1997. –Cobena, Abiteboul and Marian. Detecting Changes in XML Documents. Technical report INRIA. Xyleme –Cluet, Veltri and Vodislav. Views in a Large Scale XML Repository. VLDB 2001. –Nguyen, Abiteboul, Cobena and Preda. Monitoring XML data on the Web. Sigmod 2001.

25 VLDB-Sept 2001Amélie Marian25 Example: Edit-Scripts vs. Deltas A Possible Edit-Script: Insert(B,1,P) Insert(C,1,P) The Delta: Insert(B,2,P) Insert(C,1,P) C P BA Version 1 P A Version 0 Edit-ScriptsDeltas Relative position (at time of operation) Absolute position (final)

26 VLDB-Sept 2001Amélie Marian26 Example: Missing Information for Delta Composition (Δ(0,2)) Deltas do not give information on parents and positions of deleted elements  Positions of inserted elements in composition cannot be computed C P BA Version 1 B P DA Version 2 C P A Version 0 Δ (0,1) Δ (1,2) Δ + (1,2) Insert(B,2,P)Delete(C) Insert (D,2,P) Delete(C,1,P) Insert (D,2,P)


Download ppt "Change-Centric Management of Versions in an XML Warehouse Amélie Marian Columbia University Serge Abiteboul, Grégory Cobéna, Laurent Mignet INRIA-Rocquencourt."

Similar presentations


Ads by Google