University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
Data Flow Diagram (DFD) Review
3/5/2009Computer systems1 Analyzing System Using Data Dictionaries Computer System: 1. Data Dictionary 2. Data Dictionary Categories 3. Creating Data Dictionary.
Tools for Text Review. Algorithms The heart of computer science Definition: A finite sequence of instructions with the properties that –Each instruction.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
Multiversion Access Methods - Temporal Indexing. Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
Query Languages: Patterns & Structures. Pattern Matching Pattern –a set of syntactic features that must occur in a text segment Types of patterns –Words:
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
XSL Concepts Lecture 7. XML Display Options What can XSL Transformations do? generation of constant text suppression of content moving text (e.g., exchanging.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
1 COS 425: Database and Information Management Systems XML and information exchange.
Keys For XML Peter Buneman Susan Davidson Wenfei Fan Carmem Hara Wang Chiew Tan.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
TU/e eindhoven university of technology / faculty of mathematics and informatics Exporting Databases in XML DTD A Conceptual and Generic Approach Philippe.
1 XML and QUERY Shilpi Ahuja CSE Data Mining 4 th April 2002.
DSAC (Digital Signature Aggregation and Chaining) Digital Signature Aggregation & Chaining An approach to ensure integrity of outsourced databases.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
DSAC (Digital Signature Aggregation and Chaining) Digital Signature Aggregation & Chaining An approach to ensure integrity of outsourced databases.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
XML Compression Aslam Tajwala Kalyan Chakravorty.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
SD2520 Databases using XML and JQuery
4/20/2017.
Moving beyond free text. Authors Scientist does research Scientist publishes research results in journal article Old Paradigm:
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
DATABASE MANAGEMENT SYSTEMS BASIC CONCEPTS 1. What is a database? A database is a collection of data which can be used: alone, or alone, or combined /
XML – Extensible Markup Language XML eXtensible – add to language. Markup – delimit info using tags. Language – a way to express info.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS
Research Interests Georgia Koloniari Computer Science Department University of Ioannina, Greece.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Computers Data Representation Chapter 3, SA. Data Representation and Processing Data and information processors must be able to: Recognize external data.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Querying Structured Text in an XML Database By Xuemei Luo.
VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.
Book: Bayesian Networks : A practical guide to applications Paper-authors: Luis M. de Campos, Juan M. Fernandez-Luna, Juan F. Huete, Carlos Martine, Alfonso.
Database Systems Part VII: XML Querying Software School of Hunan University
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Fushen Wang, XinZhou, Carlo Zaniolo Using XML to Build Efficient Transaction- Time Temporal Database Systems on Relational Databases In Time Center, 2005.
Data Structures TREES.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
Analyzing Systems Using Data Dictionaries Systems Analysis and Design, 8e Kendall & Kendall 8.
1 Indexing The syntax for creating a index is: CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2,... column_n) [ COMPUTE STATISTICS ]; Why.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
2/20: Ch. 6 Data Management What is data? How is it stored? –Traditional management storage techniques; problems –DBMS.
(A comparative study for XML change detection) Grégory Cobéna (INRIA), Talel Abdessalem (ENST), Yassine Hinnach (ENST) Etude comparative sur la détection.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
XML Extensible Markup Language
Before the Relational Model COMP3211 Advanced Databases Dr Nicholas Gibbins –
Advanced Accounting Information Systems Day 28 Introduction to XBRL October 30, 2009.
Modified Slides from Dr.Peter Buneman 1 XML Constraints Constraints are a fundamental part of the semantics of the data; XML may not come with a DTD/type.
XML: Extensible Markup Language
Compressing XML Documents with Finite State Automata
CS522 Advanced database Systems
CS522 Advanced database Systems
XML Constraints Constraints are a fundamental part of the semantics of the data; XML may not come with a DTD/type – thus constraints are often the only.
Introduction to XML IR XML Group.
Presentation transcript:

University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras

What is the problem? ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 1 Most research on database content Most research on database content Usually overwrite existing state Usually overwrite existing state Need of research on database history Need of research on database history Lost scientific evidence Lost scientific evidence No verification of findings basis No verification of findings basis

Why is this interesting? ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 2 History of the data History of the data Scientific research Scientific research SWISS-PROT (protein sequence) SWISS-PROT (protein sequence) OMIM (human genes and genetic disorders) OMIM (human genes and genetic disorders) Great deal of manual labour Great deal of manual labour Continuous changes Continuous changes Access to old versions Access to old versions

First Approach ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 3 Object matching across versions Object matching across versions Changes descriptions Changes descriptions Archive space Archive space History efficient queries History efficient queries

Proposed technique (1/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 4 Based on: Hierarchical data Hierarchical data Key structured databases Key structured databases Accretive databases Accretive databases

Proposed technique (2/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 5 Merging versions into one hierarchy Merging versions into one hierarchy Elements stored once Elements stored once Timestamps Timestamps Sequence of versions Sequence of versions Time intervals Time intervals Inheritance Inheritance Keys for element identification Keys for element identification

Example ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 6

XML Model (1/3) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 7 Nodes values Nodes values T-node: data values T-node: data values A-node: attribute name, attribute value A-node: attribute name, attribute value E-node (internal nodes): tag name E-node (internal nodes): tag name  List of values of E and T children  Set of values of A children Nodes value equality Nodes value equality Agree on their value Agree on their value Path expression Path expression Sequence of node names Sequence of node names

XML Model (2/3) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 8 Key Key Pair of path expressions (Q, {P 1,…,P k }) Pair of path expressions (Q, {P 1,…,P k })  Q: target set of nodes  {P 1,…,P k }: Q key constraints Relative key Relative key Description dependent on ancestor node key Description dependent on ancestor node key Weak entities Weak entities

XML Model (3/3) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 9 Keys for previous example Keys for previous example (/,(db,{})) (/,(db,{}))  At most one db element at the root (/db,(address,{})) (/db,(address,{}))  At most one address under db node (/db,(emp,{id})) (/db,(emp,{id}))  Every employee within a db element can be uniquely identified by his id subelement (/db/emp,(name,{})), (/db/emp,(sal,{})), (/db/emp,(tel,{})) (/db/emp,(name,{})), (/db/emp,(sal,{})), (/db/emp,(tel,{}))  There can be at most one name, sal and tel node for each employee

ArchiveArchive Components (1/4) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 10 Annotate Keys Nested Merge Archiver Archiver components overview Archiver components overview Annotate Keys, Timestamps Timestamps KeysKeys NewversionNewversion New Archive

Components (2/4) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 11 Annotate keys Annotate keys Elements annotation with key values Elements annotation with key values Uniquely identified nodes Uniquely identified nodes  Path from root to node  Key annotation

Components (3/4) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 12 Nested merge Nested merge Identify corresponding elements Identify corresponding elements Merge elements Merge elements Update sets of timestamps Update sets of timestamps Nodes with no corresponding Nodes with no corresponding  Simply added

Components (4/4) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 13

Experimental Results (1/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 14 Competitive techniques Competitive techniques Incremental diff Incremental diff Cumulative diff Cumulative diff Compression methods Compression methods Gzip (text) Gzip (text) Xmill (XML) Xmill (XML)

Experimental Results (2/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 15

Efficient Retrievals (1/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 16 Version retrieval Version retrieval Binary tree for each node x with children as leaves Binary tree for each node x with children as leaves TimestampTimestamp Archive offsetArchive offset

Efficient Retrievals (2/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 17 Temporal history retrieval Temporal history retrieval Find keyed node x Find keyed node x Set of keyed children Set of keyed children Archive offset, timestamp offset Archive offset, timestamp offset Sort list Sort list Repeat for each keyed node Repeat for each keyed node

Conclusion ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 18 Efficient archiving technique Efficient archiving technique Meaningful change descriptions Meaningful change descriptions Space overhead comparable to diff approach Space overhead comparable to diff approach OMIM archive for a year OMIM archive for a year  Less than 1.12 times the space of last version  Less than 1.08 times the size of incremental-diff  40% compression with XML compression tool Works well with XML compression Works well with XML compression Basic operations with single pass Basic operations with single pass XML output (further use) XML output (further use)

Xarch (1/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 19 Archiving tool Archiving tool Extends archiving technique Extends archiving technique Sort elements by key Sort elements by key  External merge sort Query language Query language  Versions retrieval  History tracking

Xarch (2/2) ΗΥ-561XML Data Archiving – Konstantinos KouratorasSlide 20 Query language example Query language example