1 st of June, 2011 Carlos Aldeias Gabriel David Cristina Ribeiro.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
3rd Annual Plex/2E Worldwide Users Conference 13A Batch Processing in 2E Jeffrey A. Welsh, STAR BASE Consulting, Inc. September 20, 2007.
AP STUDY SESSION 2.
1
Chapter 7 System Models.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Multicriteria Decision-Making Models
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Extended Learning Module D (Office 2007 Version) Decision Analysis.
Processes and Operating Systems
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
Myra Shields Training Manager Introduction to OvidSP.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Objectives: Generate and describe sequences. Vocabulary:
UNITED NATIONS Shipment Details Report – January 2006.
1 Hyades Command Routing Message flow and data translation.
David Burdett May 11, 2004 Package Binding for WS CDL.
1 Introducing the Specifications of the Metro Ethernet Forum MEF 19 Abstract Test Suite for UNI Type 1 February 2008.
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
We need a common denominator to add these fractions.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
CALENDAR.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Break Time Remaining 10:00.
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
User Friendly Price Book Maintenance A Family of Enhancements For iSeries 400 DMAS from Copyright I/O International, 2006, 2007, 2008, 2010 Skip Intro.
Bright Futures Guidelines Priorities and Screening Tables
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
XML and Databases Exercise Session 3 (courtesy of Ghislain Fourny/ETH)
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
1 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. An Introduction to Data.
Operating Systems Operating Systems - Winter 2011 Dr. Melanie Rieback Design and Implementation.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1 Section 5.5 Dividing Polynomials Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Adding Up In Chunks.
SLP – Endless Possibilities What can SLP do for your school? Everything you need to know about SLP – past, present and future.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
CHAPTER 8 INFORMATION IN ACTION
Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
Prof.ir. Klaas H.J. Robers, 14 July Graduation: a process organised by YOU.
Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 12 View Design and Integration.
Essential Cell Biology
Clock will move after 1 minute
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Chapter 13 Web Page Design Studio
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Management Information Systems, 10/e
Introduction Peter Dolog dolog [at] cs [dot] aau [dot] dk Intelligent Web and Information Systems September 9, 2010.
© 2007 by Prentice Hall Management Information Systems, 10/e Raymond McLeod and George Schell 1 Management Information Systems, 10/e Raymond McLeod Jr.
Presentation transcript:

1 st of June, 2011 Carlos Aldeias Gabriel David Cristina Ribeiro

DWXML – A Preservation Format for Data Warehouses 2/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation Data Warehouse Preservation DWXML Definition DBPreserve Suite Application Conclusions Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 3/46 Carlos Aldeias 1 st of June, 2011 Companies, institutions and governments rely increasingly on On-Line Analytical Processing (OLAP) Major benefits for analysis and decision support Selective extraction and analysis of data from different perspectives Most systems are structured using Data Warehouses OLAP types: ROLAP – Relational OLAP MOLAP – Multidimensional OLAP HOLAP – Hybrid OLAP Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 4/46 Carlos Aldeias 1 st of June, 2011 Data Warehouse as a digital object Different from conventional digital objects: data warehouses are complex digital objects They are based on a dimensional model: Star schema, facts, dimensions with levels and hierarchies, bridges and datamarts They are often implemented on relational databases (ROLAP), keeping data in tables, views and schemas Data vs. Metadata The primary data stored into tables must be archived as well as the metadata, both at the relational and dimensional levels Technologies are evolving continually Data Warehouses created with todays technologies may not be accessible with the upcoming versions Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 5/46 Carlos Aldeias 1 st of June, 2011 InterPARES Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 6/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 7/46 Carlos Aldeias 1 st of June, 2011 Star schema Fact tableFactMeasure Bridge table DimensionHierarchyJoin KeyLevelLevel keyAttributeDatamart Snowflake schema Sub- dimension Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 8/46 Carlos Aldeias 1 st of June, 2011 DBPreserve Long-term preservation of Institutional Electronic Records and Databases Archive databases ensuring their long- term accessibility Dimensional Model Data warehouse for archives model definition Migration from a relational model to a dimensional model OAIS Model Modeling System according to OAIS Long-term preservation using XML Independence from technology Portability between systems [Rahman, 2010] [CCSDS, 2002] Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 9/46 Carlos Aldeias 1 st of June, 2011 Existing preservation approaches don´t comply with data warehouse preservation requirements Regarding data warehouses implemented with relational database technologies, some efforts can be reused Although, they still lack an important metadata layer that describes the data warehouse structure and entities Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 10/46 Carlos Aldeias 1 st of June, 2011 Star Schema - fact table is surrounded by dimensional tables Bridge Tables Example from a case study, implemented using Oracle Database 11g Enterprise Edition Release bit Production Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 11/46 Carlos Aldeias 1 st of June, 2011 A fact table is the center of a star schema Consists of facts of a business process Facts Measures : ADDITIVE NON ADDITIVE SEMI ADDITIVE Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 12/46 Carlos Aldeias 1 st of June, 2011 They give the context and meaning to the facts Represent the relevant vectors of analysis of the business process facts Usually represented by one or more dimensional tables Levels Hierarchies Attributes Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 13/46 Carlos Aldeias 1 st of June, 2011 Projects case study implements the dimensional model using Oracle Database 11g CREATE DIMENSION class_dim LEVEL class IS (IPDW_CLASS.CLASS_ID) LEVEL course IS (IPDW_CLASS.COURSE_ID) HIERARCHY class_rollup( class CHILD OF course) ATTRIBUTE class DETERMINES (IPDW_CLASS.CODE, IPDW_CLASS.ACRONYM, IPDW_CLASS.NAME, IPDW_CLASS.TYPE) ATTRIBUTE course DETERMINES (IPDW_CLASS.COUR_CODE, IPDW_CLASS.COUR_ACRONYM, IPDW_CLASS.COUR_NAME, IPDW_CLASS.COUR_TYPE, IPDW_CLASS.COURSE_PREVIOUS_COD); Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 14/46 Carlos Aldeias 1 st of June, 2011 Bridge tables are used to resolve a many to many relationship between a fact and a dimension Also used to flatten out a hierarchy in a dimension Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 15/46 Carlos Aldeias 1 st of June, 2011 Snowflake schema is similar to a star schema, but one or more dimension tables are partially normalized Sub-dimensions Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 16/46 Carlos Aldeias 1 st of June, 2011 Subset of a data warehouse Typically, a set of star and snowflake schemas Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 17/46 Carlos Aldeias 1 st of June, 2011 Analysis of relational database preservation formats DBML (Database Markup Language) [Ramalho, 2007] SIARD Format (Software Independent Archiving of Relational Databases) [SFA, 2008] Analysis on Data Warehouse XML representation XCube (for multidimensional schemas) [Hummer, 2003] Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 18/46 Carlos Aldeias 1 st of June, 2011 Decision on extending the SIARD Format Separates metadata from primary data Segmented representation of primary data Ready to use application that creates a SIARD format from a relational database (MSAccess, MSSQL and Oracle) Add a metadata layer regarding the dimensional model perspective Extracting data warehouse metadata from data dictionary Defining a XML structure for the dimensional model Embedding it into the SIARD format Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 19/46 Carlos Aldeias 1 st of June, 2011 Header folder for metadata Content folder for primary data Organized in directories Single XML file for each data table SIARD Suite – set of tools for migrating, editing and reactivating databases Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 20/46 Carlos Aldeias 1 st of June, 2011 Add a XML file with the extra metadata layer for data warehouse characterization Add the corresponding schema No action on the primary data Data in the DW ingested to the SIARD Suite as a relational database Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 21/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 22/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 23/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 24/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 25/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 26/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 27/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 28/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 29/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 30/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 31/46 Carlos Aldeias 1 st of June, 2011 <dwxml version="1.0" xsi:noNamespaceSchemaLocation="dw.xsd" xmlns:xsi=" IPDW_ANSWERS_STAR Star related to the answers CALDEIAS IPDW_ANSWERS ANSWER ADDITIVE CALDEIAS IPDW_QUESTION Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 32/46 Carlos Aldeias 1 st of June, 2011 Integrates the SiardFromDb application to build the SIARD format of the data warehouse Extracts metadata for characterization of the dimensional model Schemas, dimensions, hierarchies, levels, attributes, tables, table comments, primary and foreign keys, views Sorts the tables according to their role in the data warehouse Proposes a DWXML description based on the extracted metadata DWXML editing using GUI Graphical representation of star schemas and dimensions and their relationships Creates, views and embeds the DWXML file into the SIARD format Access and retrieves the primary data Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 33/46 Carlos Aldeias 1 st of June, 2011 Netbeans Platform 7 RC1 | JDK 7 Metadata Module SIARD Module DWXML Module Connection Module SIARDfromDB JDOM OJDBC, … Output Module Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 34/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 35/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 36/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 37/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 38/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 39/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 40/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 41/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 42/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 43/46 Carlos Aldeias 1 st of June, 2011 Data Warehouse 17 tables (one with more than 2M records) Data size: 115 MB SIARD Format 17 XML files with primary data (one with 323 MB) SIARD metadata size: 71 KB DWXML metadata size: 86 KB Total size: 360 MB Extraction times: SIARD data: 2h30m SIARD metadata : 4 min DWXML metadata : 3 sec Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 44/46 Carlos Aldeias 1 st of June, 2011 Definition of DWXML, a representation of the dimensional model of a DW Design and implementation of DBPreserve Suite Extraction of the metadata that describes the dimensional model Manual adjustments of the dimensional model Generation of the XML file and embedding into SIARD format file Primary data browse The result is compliant with the SIARD Suite tools (just the relational level) Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 45/46 Carlos Aldeias 1 st of June, 2011 [CCSDS, 2002]Consultative Committee for Space Data Systems. Reference Model for an Open Archival Information System (OAIS) - Blue Book. Washington: National Aeronautics and Space Administration, [Ferreira, 2006]Miguel Ferreira. Introdução à Preservação Digital - Conceitos, estratégias e actuais consensos. Escola de Engenharia da Universidade do Minho, [Hendley, 1998]Tony Hendley. Comparison of methods & costs of digital preservation. Technical report, British Library Research and Innovation Centre, [Hummer, 2003]Wolfgang Hummer, Andreas Bauer, and Gunnar Harde XCube: XML for Data Warehouses. In Proceedings of the 6th ACM International Workshop on Data Warehousing and OLAP (DOLAP '03). ACM, New York, NY, USA, DOI= / , [Planets, 2010]Pauline Sinclair. The digital divide: Assessing organizations preparations for digital preservation. Planets White Paper, March [Rahman, 2010]Arif Ur Rahman; Gabriel David; Cristina Ribeiro. Model migration approach for database preservation. In The Role of Digital Libraries in a Time of Global Change, 12th International Conference on Asia-Pacific Digital Libraries, ICADL 2010, Gold Coast, Australia., pages 81–90. Springer Berlin / Heidelberg, [Ramalho, 2007]José Carlos Ramalho, Miguel Ferreira, Luís Faria, Rui Castro. Relational Database Preservation through XML Modelling. In Extreme Markup Languages 2007, Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

DWXML – A Preservation Format for Data Warehouses 46/46 Carlos Aldeias 1 st of June, 2011 [SFA, 2008]Swiss Federal Archives SFA Unit Innovation and Preservation. Siard Format Description. Technical Report, Federal Department of Home Aairs FDHA, Berne, [Thibodeau, 2002]Kenneth Thibodeau. Overview of technological approaches to digital preservation and challenges in coming years. In The State of Digital Preservation: An International Perspective. Documentation Abstracts, Inc. - Institutes for Information Science, Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions