Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 st of June, 2011 Carlos Aldeias Gabriel David Cristina Ribeiro.

Similar presentations


Presentation on theme: "1 st of June, 2011 Carlos Aldeias Gabriel David Cristina Ribeiro."— Presentation transcript:

1 1 st of June, 2011 Carlos Aldeias Gabriel David Cristina Ribeiro

2 DWXML – A Preservation Format for Data Warehouses 2/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation Data Warehouse Preservation DWXML Definition DBPreserve Suite Application Conclusions Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

3 DWXML – A Preservation Format for Data Warehouses 3/46 Carlos Aldeias 1 st of June, 2011 Companies, institutions and governments rely increasingly on On-Line Analytical Processing (OLAP) Major benefits for analysis and decision support Selective extraction and analysis of data from different perspectives Most systems are structured using Data Warehouses OLAP types: ROLAP – Relational OLAP MOLAP – Multidimensional OLAP HOLAP – Hybrid OLAP Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

4 DWXML – A Preservation Format for Data Warehouses 4/46 Carlos Aldeias 1 st of June, 2011 Data Warehouse as a digital object Different from conventional digital objects: data warehouses are complex digital objects They are based on a dimensional model: Star schema, facts, dimensions with levels and hierarchies, bridges and datamarts They are often implemented on relational databases (ROLAP), keeping data in tables, views and schemas Data vs. Metadata The primary data stored into tables must be archived as well as the metadata, both at the relational and dimensional levels Technologies are evolving continually Data Warehouses created with todays technologies may not be accessible with the upcoming versions Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

5 DWXML – A Preservation Format for Data Warehouses 5/46 Carlos Aldeias 1 st of June, 2011 InterPARES Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

6 DWXML – A Preservation Format for Data Warehouses 6/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

7 DWXML – A Preservation Format for Data Warehouses 7/46 Carlos Aldeias 1 st of June, 2011 Star schema Fact tableFactMeasure Bridge table DimensionHierarchyJoin KeyLevelLevel keyAttributeDatamart Snowflake schema Sub- dimension Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

8 DWXML – A Preservation Format for Data Warehouses 8/46 Carlos Aldeias 1 st of June, 2011 DBPreserve Long-term preservation of Institutional Electronic Records and Databases Archive databases ensuring their long- term accessibility Dimensional Model Data warehouse for archives model definition Migration from a relational model to a dimensional model OAIS Model Modeling System according to OAIS Long-term preservation using XML Independence from technology Portability between systems [Rahman, 2010] [CCSDS, 2002] Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

9 DWXML – A Preservation Format for Data Warehouses 9/46 Carlos Aldeias 1 st of June, 2011 Existing preservation approaches don´t comply with data warehouse preservation requirements Regarding data warehouses implemented with relational database technologies, some efforts can be reused Although, they still lack an important metadata layer that describes the data warehouse structure and entities Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

10 DWXML – A Preservation Format for Data Warehouses 10/46 Carlos Aldeias 1 st of June, 2011 Star Schema - fact table is surrounded by dimensional tables Bridge Tables Example from a case study, implemented using Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

11 DWXML – A Preservation Format for Data Warehouses 11/46 Carlos Aldeias 1 st of June, 2011 A fact table is the center of a star schema Consists of facts of a business process Facts Measures : ADDITIVE NON ADDITIVE SEMI ADDITIVE Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

12 DWXML – A Preservation Format for Data Warehouses 12/46 Carlos Aldeias 1 st of June, 2011 They give the context and meaning to the facts Represent the relevant vectors of analysis of the business process facts Usually represented by one or more dimensional tables Levels Hierarchies Attributes Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

13 DWXML – A Preservation Format for Data Warehouses 13/46 Carlos Aldeias 1 st of June, 2011 Projects case study implements the dimensional model using Oracle Database 11g CREATE DIMENSION class_dim LEVEL class IS (IPDW_CLASS.CLASS_ID) LEVEL course IS (IPDW_CLASS.COURSE_ID) HIERARCHY class_rollup( class CHILD OF course) ATTRIBUTE class DETERMINES (IPDW_CLASS.CODE, IPDW_CLASS.ACRONYM, IPDW_CLASS.NAME, IPDW_CLASS.TYPE) ATTRIBUTE course DETERMINES (IPDW_CLASS.COUR_CODE, IPDW_CLASS.COUR_ACRONYM, IPDW_CLASS.COUR_NAME, IPDW_CLASS.COUR_TYPE, IPDW_CLASS.COURSE_PREVIOUS_COD); Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

14 DWXML – A Preservation Format for Data Warehouses 14/46 Carlos Aldeias 1 st of June, 2011 Bridge tables are used to resolve a many to many relationship between a fact and a dimension Also used to flatten out a hierarchy in a dimension Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

15 DWXML – A Preservation Format for Data Warehouses 15/46 Carlos Aldeias 1 st of June, 2011 Snowflake schema is similar to a star schema, but one or more dimension tables are partially normalized Sub-dimensions Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

16 DWXML – A Preservation Format for Data Warehouses 16/46 Carlos Aldeias 1 st of June, 2011 Subset of a data warehouse Typically, a set of star and snowflake schemas Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

17 DWXML – A Preservation Format for Data Warehouses 17/46 Carlos Aldeias 1 st of June, 2011 Analysis of relational database preservation formats DBML (Database Markup Language) [Ramalho, 2007] SIARD Format (Software Independent Archiving of Relational Databases) [SFA, 2008] Analysis on Data Warehouse XML representation XCube (for multidimensional schemas) [Hummer, 2003] Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

18 DWXML – A Preservation Format for Data Warehouses 18/46 Carlos Aldeias 1 st of June, 2011 Decision on extending the SIARD Format Separates metadata from primary data Segmented representation of primary data Ready to use application that creates a SIARD format from a relational database (MSAccess, MSSQL and Oracle) Add a metadata layer regarding the dimensional model perspective Extracting data warehouse metadata from data dictionary Defining a XML structure for the dimensional model Embedding it into the SIARD format Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

19 DWXML – A Preservation Format for Data Warehouses 19/46 Carlos Aldeias 1 st of June, 2011 Header folder for metadata Content folder for primary data Organized in directories Single XML file for each data table SIARD Suite – set of tools for migrating, editing and reactivating databases Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

20 DWXML – A Preservation Format for Data Warehouses 20/46 Carlos Aldeias 1 st of June, 2011 Add a XML file with the extra metadata layer for data warehouse characterization Add the corresponding schema No action on the primary data Data in the DW ingested to the SIARD Suite as a relational database Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

21 DWXML – A Preservation Format for Data Warehouses 21/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

22 DWXML – A Preservation Format for Data Warehouses 22/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

23 DWXML – A Preservation Format for Data Warehouses 23/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

24 DWXML – A Preservation Format for Data Warehouses 24/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

25 DWXML – A Preservation Format for Data Warehouses 25/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

26 DWXML – A Preservation Format for Data Warehouses 26/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

27 DWXML – A Preservation Format for Data Warehouses 27/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

28 DWXML – A Preservation Format for Data Warehouses 28/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

29 DWXML – A Preservation Format for Data Warehouses 29/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

30 DWXML – A Preservation Format for Data Warehouses 30/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

31 DWXML – A Preservation Format for Data Warehouses 31/46 Carlos Aldeias 1 st of June, 2011 <dwxml version="1.0" xsi:noNamespaceSchemaLocation="dw.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> IPDW_ANSWERS_STAR Star related to the answers CALDEIAS IPDW_ANSWERS ANSWER ADDITIVE CALDEIAS IPDW_QUESTION...... Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

32 DWXML – A Preservation Format for Data Warehouses 32/46 Carlos Aldeias 1 st of June, 2011 Integrates the SiardFromDb application to build the SIARD format of the data warehouse Extracts metadata for characterization of the dimensional model Schemas, dimensions, hierarchies, levels, attributes, tables, table comments, primary and foreign keys, views Sorts the tables according to their role in the data warehouse Proposes a DWXML description based on the extracted metadata DWXML editing using GUI Graphical representation of star schemas and dimensions and their relationships Creates, views and embeds the DWXML file into the SIARD format Access and retrieves the primary data Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

33 DWXML – A Preservation Format for Data Warehouses 33/46 Carlos Aldeias 1 st of June, 2011 Netbeans Platform 7 RC1 | JDK 7 Metadata Module SIARD Module DWXML Module Connection Module SIARDfromDB JDOM OJDBC, … Output Module Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

34 DWXML – A Preservation Format for Data Warehouses 34/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

35 DWXML – A Preservation Format for Data Warehouses 35/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

36 DWXML – A Preservation Format for Data Warehouses 36/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

37 DWXML – A Preservation Format for Data Warehouses 37/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

38 DWXML – A Preservation Format for Data Warehouses 38/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

39 DWXML – A Preservation Format for Data Warehouses 39/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

40 DWXML – A Preservation Format for Data Warehouses 40/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

41 DWXML – A Preservation Format for Data Warehouses 41/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

42 DWXML – A Preservation Format for Data Warehouses 42/46 Carlos Aldeias 1 st of June, 2011 Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

43 DWXML – A Preservation Format for Data Warehouses 43/46 Carlos Aldeias 1 st of June, 2011 Data Warehouse 17 tables (one with more than 2M records) Data size: 115 MB SIARD Format 17 XML files with primary data (one with 323 MB) SIARD metadata size: 71 KB DWXML metadata size: 86 KB Total size: 360 MB Extraction times: SIARD data: 2h30m SIARD metadata : 4 min DWXML metadata : 3 sec Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

44 DWXML – A Preservation Format for Data Warehouses 44/46 Carlos Aldeias 1 st of June, 2011 Definition of DWXML, a representation of the dimensional model of a DW Design and implementation of DBPreserve Suite Extraction of the metadata that describes the dimensional model Manual adjustments of the dimensional model Generation of the XML file and embedding into SIARD format file Primary data browse The result is compliant with the SIARD Suite tools (just the relational level) Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

45 DWXML – A Preservation Format for Data Warehouses 45/46 Carlos Aldeias 1 st of June, 2011 [CCSDS, 2002]Consultative Committee for Space Data Systems. Reference Model for an Open Archival Information System (OAIS) - Blue Book. Washington: National Aeronautics and Space Administration, 2002. [Ferreira, 2006]Miguel Ferreira. Introdução à Preservação Digital - Conceitos, estratégias e actuais consensos. Escola de Engenharia da Universidade do Minho, 2006. [Hendley, 1998]Tony Hendley. Comparison of methods & costs of digital preservation. Technical report, British Library Research and Innovation Centre, 1998. [Hummer, 2003]Wolfgang Hummer, Andreas Bauer, and Gunnar Harde. 2003. XCube: XML for Data Warehouses. In Proceedings of the 6th ACM International Workshop on Data Warehousing and OLAP (DOLAP '03). ACM, New York, NY, USA, 33-40. DOI=10.1145/956060.956067, http://doi.acm.org/10.1145/956060.956067 [Planets, 2010]Pauline Sinclair. The digital divide: Assessing organizations preparations for digital preservation. Planets White Paper, March 2010. [Rahman, 2010]Arif Ur Rahman; Gabriel David; Cristina Ribeiro. Model migration approach for database preservation. In The Role of Digital Libraries in a Time of Global Change, 12th International Conference on Asia-Pacific Digital Libraries, ICADL 2010, Gold Coast, Australia., pages 81–90. Springer Berlin / Heidelberg, 2010. [Ramalho, 2007]José Carlos Ramalho, Miguel Ferreira, Luís Faria, Rui Castro. Relational Database Preservation through XML Modelling. In Extreme Markup Languages 2007, 2007. Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions

46 DWXML – A Preservation Format for Data Warehouses 46/46 Carlos Aldeias 1 st of June, 2011 [SFA, 2008]Swiss Federal Archives SFA Unit Innovation and Preservation. Siard Format Description. Technical Report, Federal Department of Home Aairs FDHA, Berne, 2008. [Thibodeau, 2002]Kenneth Thibodeau. Overview of technological approaches to digital preservation and challenges in coming years. In The State of Digital Preservation: An International Perspective. Documentation Abstracts, Inc. - Institutes for Information Science, 2002. Introduction Motivation DW Preservation DWXML DBPreserve Suite Conclusions


Download ppt "1 st of June, 2011 Carlos Aldeias Gabriel David Cristina Ribeiro."

Similar presentations


Ads by Google