Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Similar presentations


Presentation on theme: "University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras."— Presentation transcript:

1 University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras

2 What is the problem? ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 1 Most research on database content Most research on database content Usually overwrite existing state Usually overwrite existing state Need of research on database history Need of research on database history Lost scientific evidence Lost scientific evidence No verification of findings basis No verification of findings basis

3 Why is this interesting? ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 2 History of the data History of the data Scientific research Scientific research SWISS-PROT (protein sequence) SWISS-PROT (protein sequence) OMIM (human genes and genetic disorders) OMIM (human genes and genetic disorders) Great deal of manual labour Great deal of manual labour Continuous changes Continuous changes Access to old versions Access to old versions

4 First Approach ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 3 Object matching across versions Object matching across versions Changes descriptions Changes descriptions Archive space Archive space History efficient queries History efficient queries

5 Proposed technique (1/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 4 Based on: Hierarchical data Hierarchical data Key structured databases Key structured databases Accretive databases Accretive databases

6 Proposed technique (2/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 5 Merging versions into one hierarchy Merging versions into one hierarchy Elements stored once Elements stored once Timestamps Timestamps Sequence of versions Sequence of versions Time intervals Time intervals Inheritance Inheritance Keys for element identification Keys for element identification

7 Example ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 6

8 XML Model (1/3) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 7 Nodes values Nodes values T-node: data values T-node: data values A-node: attribute name, attribute value A-node: attribute name, attribute value E-node (internal nodes): tag name E-node (internal nodes): tag name  List of values of E and T children  Set of values of A children Nodes value equality Nodes value equality Agree on their value Agree on their value Path expression Path expression Sequence of node names Sequence of node names

9 XML Model (2/3) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 8 Key Key Pair of path expressions (Q, {P 1,…,P k }) Pair of path expressions (Q, {P 1,…,P k })  Q: target set of nodes  {P 1,…,P k }: Q key constraints Relative key Relative key Description dependent on ancestor node key Description dependent on ancestor node key Weak entities Weak entities

10 XML Model (3/3) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 9 Keys for previous example Keys for previous example (/,(db,{})) (/,(db,{}))  At most one db element at the root (/db,(address,{})) (/db,(address,{}))  At most one address under db node (/db,(emp,{id})) (/db,(emp,{id}))  Every employee within a db element can be uniquely identified by his id subelement (/db/emp,(name,{})), (/db/emp,(sal,{})), (/db/emp,(tel,{})) (/db/emp,(name,{})), (/db/emp,(sal,{})), (/db/emp,(tel,{}))  There can be at most one name, sal and tel node for each employee

11 ArchiveArchive Components (1/4) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 10 Annotate Keys Nested Merge Archiver Archiver components overview Archiver components overview Annotate Keys, Timestamps Timestamps KeysKeys NewversionNewversion New Archive

12 Components (2/4) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 11 Annotate keys Annotate keys Elements annotation with key values Elements annotation with key values Uniquely identified nodes Uniquely identified nodes  Path from root to node  Key annotation

13 Components (3/4) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 12 Nested merge Nested merge Identify corresponding elements Identify corresponding elements Merge elements Merge elements Update sets of timestamps Update sets of timestamps Nodes with no corresponding Nodes with no corresponding  Simply added

14 Components (4/4) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 13

15 Experimental Results (1/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 14 Competitive techniques Competitive techniques Incremental diff Incremental diff Cumulative diff Cumulative diff Compression methods Compression methods Gzip (text) Gzip (text) Xmill (XML) Xmill (XML)

16 Experimental Results (2/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 15

17 Efficient Retrievals (1/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 16 Version retrieval Version retrieval Binary tree for each node x with children as leaves Binary tree for each node x with children as leaves TimestampTimestamp Archive offsetArchive offset

18 Efficient Retrievals (2/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 17 Temporal history retrieval Temporal history retrieval Find keyed node x Find keyed node x Set of keyed children Set of keyed children Archive offset, timestamp offset Archive offset, timestamp offset Sort list Sort list Repeat for each keyed node Repeat for each keyed node

19 Conclusion ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 18 Efficient archiving technique Efficient archiving technique Meaningful change descriptions Meaningful change descriptions Space overhead comparable to diff approach Space overhead comparable to diff approach OMIM archive for a year OMIM archive for a year  Less than 1.12 times the space of last version  Less than 1.08 times the size of incremental-diff  40% compression with XML compression tool Works well with XML compression Works well with XML compression Basic operations with single pass Basic operations with single pass XML output (further use) XML output (further use)

20 Xarch (1/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 19 Archiving tool Archiving tool Extends archiving technique Extends archiving technique Sort elements by key Sort elements by key  External merge sort Query language Query language  Versions retrieval  History tracking

21 Xarch (2/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 20 Query language example Query language example


Download ppt "University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras."

Similar presentations


Ads by Google