Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.

Similar presentations


Presentation on theme: "Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING."— Presentation transcript:

1

2 Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING

3 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-2 DEFINITIONS  Data Warehouse  A subject-oriented, integrated, time-variant, non- updatable collection of data used in support of management decision-making processes  Subject-oriented: data organized by customers, patients, students, products  Integrated: consistent naming conventions, formats; from multiple data sources  Time-variant: can study trends and changes  Non-updatable: read-only, periodically refreshed  Data Mart  A data warehouse that is limited in scope

4 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-3 NEED FOR DATA WAREHOUSING  Integrated, company-wide view of high- quality information (from disparate databases)  Separation of operational and informational systems and data (for improved performance)  Operational system – real time, transaction processing systems  Informational system – decision support, read only.

5 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-4 ISSUES WITH COMPANY-WIDE VIEW  Inconsistent key structures  Synonyms  Free-form vs. structured fields  Inconsistent data values  Missing data See figure 9-1 for example

6 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-5 Figure 9-1 Examples of heterogeneous data

7 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-6 Figure 9-2 Independent data mart data warehousing architecture Data marts: Mini-warehouses, limited in scope E T L Separate ETL for each independent data mart Data access complexity due to multiple data marts

8 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-7 Figure 9-3 Dependent data mart with operational data store: a three-level architecture E T L Single ETL for (EDW) enterprise data warehouse (EDW) Simpler data access ODS ODS provides option for obtaining current data Dependent data marts loaded from EDW

9 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-8 E T L Near real-time ETL for Data Warehouse ODS data warehouse ODS and data warehouse are one and the same Data marts are NOT separate databases, but logical views of the data warehouse  Easier to create new data marts Figure 9-4 Logical data mart and real time warehouse architecture

10 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-9 DATA CHARACTERISTICS STATUS VS. EVENT DATA Status Event = a database action (create/ update/ delete) that results from a transaction Figure 9-6 Example of DBMS log entry

11 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-10 With transient data, changes to existing records are written over previous records, thus destroying the previous data content. Figure 9-7 Transient operational data DATA CHARACTERISTICS TRANSIENT VS. PERIODIC DATA

12 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-11 Periodic data are never physically altered or deleted once they have been added to the store. Figure 9-8 Periodic warehouse data DATA CHARACTERISTICS TRANSIENT VS. PERIODIC DATA

13 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-12 STAR SCHEMA (DIMENSIONAL MODEL)  Fact and dimension tables.  Each dimension table has a one-to-many relationship to the central fact table.  The primary key of a dimension table is a foreign key in the fact table. The primary key of the fact table is a composite key consisting of all foreign keys.

14 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-13 star schema Figure 9-9 Components of a star schema Fact tables contain factual or quantitative data Dimension tables contain descriptions about the subjects of the business 1:N relationship between dimension tables and fact tables Excellent for ad-hoc queries, but bad for online transaction processing Dimension tables are denormalized to maximize performance

15 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-14 Figure 9-10 Star schema example Fact table provides statistics for sales broken down by product, period and store dimensions

16 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-15 Figure 9-11 Star schema with sample data

17 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-16 16 ISSUES REGARDING STAR SCHEMA  Dimension table keys should be surrogate (non- intelligent and non-business related), because:  Keys may change over time  Length/format consistency  Granularity of fact table: level of detail (in time, location, product grouping, etc.) for each record in the fact table  Transactional grain – finest level  Aggregated grain – more summarized  Finer grain  more dimension tables, more rows in fact table  Duration of the database – how much history should be kept?

18 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-17 VARIATIONS OF THE STAR SCHEMA  Multiple Facts Tables  Can improve performance  Often used to store facts for different combinations of dimensions  Conformed dimensions  Factless Facts Tables  No nonkey data, but foreign keys for associated dimensions  Used for:  Tracking events  Inventory coverage

19 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-18 Figure 9-13 Conformed dimensions Conformed dimension Associated with multiple fact tables Two fact tables  two (connected) start schemas.

20 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-19 Figure 9-14a Factless fact table showing occurrence of an event No data in fact table, just keys associating dimension records Fact table forms an n-ary relationship between dimensions

21 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-20  Include all information for a dimension table in a single de- normalized table  Normalize the dimension into a snowflake schema NORMALIZATION ISSUES IN DIMENSION TABLES

22 Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-21


Download ppt "Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING."

Similar presentations


Ads by Google