Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
C6 Databases.
Managing Data Resources
© Copyright 2011 John Wiley & Sons, Inc.
Data Warehouse IMS5024 – presented by Eder Tsang.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design Copyright 2000 © John Wiley & Sons, Inc. All rights reserved. Slide 1 Key.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
Designing the Data Warehouse and Data Mart Methodologies and Techniques.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Data Staging Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of.
Chapter 11 Data Management Layer Design
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Data Warehouse success depends on metadata
PowerPoint Presentation for Dennis, Wixom & Tegarden Systems Analysis and Design Copyright 2001 © John Wiley & Sons, Inc. All rights reserved. Slide 1.
Introduction to Database Management
Chapter 5 Normalization of Database Tables
ETL Design and Development Michael A. Fudge, Jr.
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Chapters 17 & 18 Physical Database Design Methodology.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Databases and Data Warehouses: Supporting the Analytics-Driven.
Database Systems: Design, Implementation, and Management Tenth Edition
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
Chapter Chapter 13-2 Accounting Information Systems, 1 st Edition Data and Databases.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
1 Data Warehouses BUAD/American University Data Warehouses.
2 Copyright © Oracle Corporation, All rights reserved. Defining Data Warehouse Concepts and Terminology.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Slide 1 Object Persistence Design Chapter 13 Alan Dennis, Barbara Wixom, and David Tegarden John Wiley & Sons, Inc. Slides by Fred Niederman Edited by.
Announcements. Data Management Chapter 12 Traditional File Approach  Structure Field  Record  File  Fixed All records have common fields, and a field.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Chapter 3 Databases and Data Warehouses: Building Business Intelligence Copyright © 2010 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Advanced Accounting Information Systems Day 10 answers Organizing and Manipulating Data September 16, 2009.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Foundations of Business Intelligence: Databases and Information Management.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
7 Strategies for Extracting, Transforming, and Loading.
Chapter 4 Logical & Physical Database Design
Session 1 Module 1: Introduction to Data Integrity
1 Intro stored procedures Declaring parameters Using in a sproc Intro to transactions Concurrency control & recovery States of transactions Desirable.
Business Intelligence Training Siemens Engineering Pakistan Zeeshan Shah December 07, 2009.
Data Warehouse A place the information system department puts the data that is turned into information. Data must be properly prepared,organized,and presented.
MIS 451 Building Business Intelligence Systems Data Staging.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
C Copyright © 2007, Oracle. All rights reserved. Introduction to Data Warehousing Fundamentals.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
7.5 Using Stored-Procedure and Triggers NAME MATRIC NUM GROUP Muhammad Azwan Bin Khairul Anwar CS2305A Muhammad Faiz Bin Badrol Shah CS2305B.
ETL Design - Stage Philip Noakes May 9, 2015.
Defining Data Warehouse Concepts and Terminology
Overview of MDM Site Hub
Data warehouse and OLAP
Data Warehouse.
Defining Data Warehouse Concepts and Terminology
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Data Warehouse A place the information system department puts the data that is turned into information. Data must be properly prepared,organized,and presented.
Systems Analysis and Design
Data Warehouse.
Data Warehousing Concepts
Database Design Chapter 7.
Presentation transcript:

Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho

2 Data Warehousing Lab. DW Index  The Information Utility's Infrastructure  The Preferred Architecture: Integration Layer and High Performance Query Structures  Alternate Warehousing Architectures Data Store 1 - The Source Systems Data Store 1 - The Source Systems Data Flow 1 - From the Data sources to the Integration layer Data Flow 1 - From the Data sources to the Integration layer Data Store 2 - The Integration Layer Data Store 2 - The Integration Layer Data Flow 2 - From the Integration Layer to the High Performance Query Structures Data Flow 2 - From the Integration Layer to the High Performance Query Structures Data Store 3 - High Performance Query Structures(HPQS) Data Store 3 - High Performance Query Structures(HPQS) Data Flow 3 - From the High Performance Query Structures to the End User Reporting Applications Data Flow 3 - From the High Performance Query Structures to the End User Reporting Applications Data Store 4 - Data in the End User's Hands Data Store 4 - Data in the End User's Hands

3 Data Warehousing Lab. DW The Information Utility's Infrastructure warehouse must: warehouse must:  extract data from a variety of sources  integrate data into a common repository  put data into a format that users can use  provide users with tools to access the warehouse

4 Data Warehousing Lab. DW The Preferred Architecture: Integration Layer and High Performance Query Structures 4 data stores and 3 data flows. 4 data stores and 3 data flows.

5 Data Warehousing Lab. DW Data Store 1 - The Source Systems provide data to warehouse provide data to warehouse  enterprise resource planning package(ERP) SAP, PeopleSoft, Oracle applications SAP, PeopleSoft, Oracle applications  home-grown applications OASIS system OASIS system  outside sources data purchased from outside vendors data purchased from outside vendors source systems sales, accounting, distribution, etc. warehouse data

6 Data Warehousing Lab. DW Flow 1 - From the Data sources to the Integration layer data extraction step data extraction step  data out of its sources  extracted at the beginning of every data flow  very complex step variety of data storage technologies ex. Oracle, DB2, Infomix, IMS, other formats variety of data storage technologies ex. Oracle, DB2, Infomix, IMS, other formats -> require select statements and each code  consideration for extraction

7 Data Warehousing Lab. DW Flow 1 - From the Data sources to the Integration layer Is This Extract Supporting the Initial Load of the Warehouse or a Periodic Refresh Load? Is This Extract Supporting the Initial Load of the Warehouse or a Periodic Refresh Load?  problems with complete refreshes warehouse is a record of history! warehouse is a record of history! -> frequently lost by source systems. warehouses tend to be very large! warehouses tend to be very large! -> poor computing and telecommunications bandwidth  two architectures to load warehouse initial loadperiodical refresh history data from offline storageonline data bring it all overchanged source records use special logic for timestamps

8 Data Warehousing Lab. DW Flow 1 - From the Data sources to the Integration layer How Will I Determine What Records to Extract? How Will I Determine What Records to Extract?  change data capture what source records have changed what source records have changed how, those records are moved to the warehouse how, those records are moved to the warehouse  delete question! no trace, the deleted record is just gone! no trace, the deleted record is just gone!  Techniques recognizing changes Timestamps Timestamps  records whenever inserted and deleted  reduced search what records have changed. Triggers Triggers  put trigger on the source tables  write a corresponding(insert,update,delete) message in a log file

9 Data Warehousing Lab. DW Flow 1 - From the Data sources to the Integration layer Application Integration Software(AIS) Application Integration Software(AIS)  MQ Series, Mercator, Tibco..  link applications, when a transaction occurs in one, transmit it to all the others.  all transactions in AIS-enabled systems  real-time access to data File Compares File Compares  compare today ’ s file to the last loaded file  difficult implementation and less accuracy

10 Data Warehousing Lab. DW Flow 1 - From the Data sources to the Integration layer How Will I Format the Extracted Records? How Will I Format the Extracted Records?  store extracted records with each mean what source system generated the record what source system generated the record when the record was obtained, when the record was obtained, the key of the record the key of the record What Will I Do with the Extracted Records? What Will I Do with the Extracted Records?  data loading programs read flat files / load the data into the warehouse read flat files / load the data into the warehouse  "loosely coupled" warehousing architectures separate extract programs and load programs separate extract programs and load programs ->more flexible and maintainable warehouse! ->more flexible and maintainable warehouse!

11 Data Warehousing Lab. DW Flow 1 - From the Data sources to the Integration layer A Few Notes About Dirty Data A Few Notes About Dirty Data  dirty in several ways Format violations Format violations Referential integrity violations Referential integrity violations Cross-system matching violations Cross-system matching violations Internal consistency violations Internal consistency violations  dirty data makes warehouse unreliable makes warehouse unreliable corrected in the source systems before extracting corrected in the source systems before extracting both refresh data and history data both refresh data and history data

12 Data Warehousing Lab. DW Data Store 2 - The Integration Layer a normalized database in a single place a normalized database in a single place normalization normalization  break flat file into smaller files to store the data more efficiently. Why Build an Integration Layer? Why Build an Integration Layer?  Avoids extraction repetition multiple data marts using data from same source systems multiple data marts using data from same source systems -> read from only one source(already integrated, clean data)  Ensures standard interpretation of enterprise data multiple groups interpret the same data differently multiple groups interpret the same data differently -> develop common definitions shared across the organization  Provides a more flexible repository than the denormalized structures in the HPQS layer denormalized data structures in HPQS for querying are inflexible denormalized data structures in HPQS for querying are inflexible -> complex and required reintegration, recleasing

13 Data Warehousing Lab. DW Data Store 2 - The Integration Layer

14 Data Warehousing Lab. DW Data Store 2 - The Integration Layer Introduction to Database Normalization Introduction to Database Normalization - data model in third normal form  completely denormalized Data 1NF

15 Data Warehousing Lab. DW Data Store 2 - The Integration Layer  First Normal Form eliminate repeating groups! eliminate repeating groups! 2NF

16 Data Warehousing Lab. DW Data Store 2 - The Integration Layer  Second Normal Form all non-key attributes of a table must rely on the entire key of the table all non-key attributes of a table must rely on the entire key of the table 3NF

17 Data Warehousing Lab. DW Data Store 2 - The Integration Layer  Third Normal Form all non-key fields must depend solely on the table's primary key all non-key fields must depend solely on the table's primary key

18 Data Warehousing Lab. DW Data Store 2 - The Integration Layer What "Extra" Data Must the Integration Layer Hold? What "Extra" Data Must the Integration Layer Hold?  surrogate Keys Sequential number generated by warehouse load programs Sequential number generated by warehouse load programs have no business meaning have no business meaning Benefits Benefits  single surrogate key for same attribute having different keys  easy tracking for Moving information  dates, statuses, and other fields auditing support, easy identifying data to data mart auditing support, easy identifying data to data mart additional information in the warehouse additional information in the warehouse Ex. insert date, last update date, status flag, etc. Ex. insert date, last update date, status flag, etc. Another Note About Dirty Data Another Note About Dirty Data  Techniques for handling bad records Ignoring them. Ignoring them. Rejecting bad records, but saving them in a separate file for manual review. Rejecting bad records, but saving them in a separate file for manual review. Loading the bad record and pointing out the errors for later review. Loading the bad record and pointing out the errors for later review.

19 Data Warehousing Lab. DW Data Store 2 - The Integration Layer key

20 Data Warehousing Lab. DW Data Flow 2 - From the Integration Layer to the High Performance Query Structures data is extracted from the integration layer and inserted into the data marts data is extracted from the integration layer and inserted into the data marts  ETL: extract, transform, and load to populate data marts  benefits loading from integration lay no cleansing and integration no cleansing and integration Identifying the loading records using timestamps Identifying the loading records using timestamps no creating surrogate keys (only reuse!) no creating surrogate keys (only reuse!)  use of summary tables differ from data warehouse differ from data warehouse some summaries of their atomic-level detail some summaries of their atomic-level detail ->load both the atomic level data and summary tables Oracle8i Oracle8i  create materialized view  automatical refresh every commit

21 Data Warehousing Lab. DW Data Store 3 - High Performance Query Structures(HPQS) databases and data structures to support end-user queries databases and data structures to support end-user queries databases managed by either relational database engines or multidimensional database engines databases managed by either relational database engines or multidimensional database engines logical structure, not physical structure logical structure, not physical structure  share the same computer With data warehouse  physically different table designs more easier and speedier for end user to access than normalized database formats. more easier and speedier for end user to access than normalized database formats.

22 Data Warehousing Lab. DW Data Flow 3 - From the High Performance Query Structures to the End User Reporting Applications Query tools issue SQL calls to relational databases Query tools issue SQL calls to relational databases data is returned to the tools and formated data is returned to the tools and formated

23 Data Warehousing Lab. DW Data Store 4 - Data in the End User's Hands report and analysis in end-user's hands report and analysis in end-user's hands  the last data store in warehousing architecture  "How can I prevent a bad employee from selling warehouse data to one of our competitions?" only way to deny him access to that data in the first place only way to deny him access to that data in the first place

24 Data Warehousing Lab. DW Alternate Warehousing Architectures Alternate Architecture 1 - No Warehouse Alternate Architecture 1 - No Warehouse  no demand for a warehouse, don't build it transaction systems are strong and end -user queries are limited transaction systems are strong and end -user queries are limited Alternate Architecture 2 - Normalized Design Alternate Architecture 2 - Normalized Design  data integrated in integration layer  users query directly out of the integration layer integration benefits, no usability and query performance integration benefits, no usability and query performance Alternate Architecture 3 - Just Data Marts Alternate Architecture 3 - Just Data Marts  building one or more data marts without a normalized integration layer no need data integrated from multiple systems. no need data integrated from multiple systems.