Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehouse IMS5024 – presented by Eder Tsang.

Similar presentations


Presentation on theme: "Data Warehouse IMS5024 – presented by Eder Tsang."— Presentation transcript:

1 Data Warehouse IMS5024 – presented by Eder Tsang

2 Data Warehouse  A data warehouse is a system consisting of processes and databases used to provide the “data infrastructure” for EIS and DSS  “… a subject-oriented, integrated, timevariant, and non- volatile collection of data in support of management’s decisions” Inmon and Hackathorn (1994 )

3 Data warehouse - subject oriented  The data warehouse is organised by “data subjects” that are relevant to the organisation. – Customer, claim, shipment, product  This may be contrasted with the process orientation of many OLTP systems

4 Data warehouse - integrated  Data in the warehouse is structured based on a corporate- wide model, spanning the functional boundaries of legacy systems  This includes naming standards, units of measurement and periodicity

5 Data warehouse - time variant  Data is the data warehouse is characterised by the time- series nature of historical data  The data consists of a series of “snapshots” which are time-stamped and record values at a moment in time  This supports trend analysis of the data

6 Data warehouse - non volatile  The data warehouse is not continuously updated (inserts, eletes and changes) like data in an OLTP system  Data in a data warehouse is periodically up-loaded at a scheduled time intervals (say daily)

7 Motivations for data warehousing  Demands on OLTP data bases for query processing would be too great  Data warehousing is designed for efficient retrieval  Data in legacy systems is frequently inconsistent, of poor quality and stored in different formats  Reduce costs in providing data for decision making

8 Motivations for data warehousing  Support for focus on complete business processes (BPR)  Support for new initiatives – CRM, Balanced Scorecard  Industry sources quote ROI’s averaging 401% over 3 years  Remain competitive

9 Typical Data Warehouse Architecture

10 An Actual Data Warehouse

11 Data warehouse development  Requirements identification  Logical design, data modelling  Data extract, transform and load (ETL)  Warehouse architecture, technology and tools  Physical database design  Delivery systems  Operational policies

12 Designing a data warehouse – data design There are two main approaches to data modelling or data warehouse design – entity relationship modelling and normalisation – dimensional modelling

13 The design of databases using a traditional E-R approach  Entities and relationships  Normalisation 3NF

14 Entity relationship schema

15 Why do we normalise data?  Normalisation is a process for converting complex data structures into simple, stable data structures  Normalisation protects integrity of database by avoiding anomalies (update, delete, create) Normalised data models are: robust and stable have minimum redundancy

16 Dimensional Modeling (star schema)

17 Components of dimensional model: –Fact Tables : contain measurements of business eg. Sales, purchase order, shipment – Dimension Tables : store the descriptions of the dimensions of the business eg. Product, customer, vendor, store

18 Dimensional Modeling (star schema)  Each dimension table has a single primary key that corresponds exactly to one of the components of the multipart key in the fact table.  A fact table always expresses a many to many relationship (the key is composed of foreign keys  The most useful facts in a fact table are numeric and additive ( typically values are added up)

19 Snowflake schema  Snowflake schema –all the tables are normalised  Star schemas are preferable to snowflake – fewer joins for information retrieval

20 Dimensional Modelling vs E-R modelling  the purpose of dimensional modelling structure data for easy and efficient analysis  E-R modelling creates a single required to support organisation’s Whereas  DM creates individual models for business/decision interest eg. model for sales info model for Inventory info

21 Entity relationship schema (3NF)

22 Corresponding Star schema

23 Corresponding snowflake schema

24 Dimensional Modelling vs E-R modelling (Con’t)  OLTP and DW have different purpose: – operational vs informational  Normalisation protects integrity of database by avoiding anomalies (update, delete, create)  Data models for data warehouse do not have to be normalised – In contrast, data in DW does not change often – periodic additions of new data

25 DM vs. E-R modeling debate (Kimball’s view)  OLTP systems are volatile – high rates of update transactions  In normalised models the goal is to reduce data redundancy and prevent update anomalies  Data in a data warehouse does not need to be normalised because it is periodically refreshed not updated by user transactions


Download ppt "Data Warehouse IMS5024 – presented by Eder Tsang."

Similar presentations


Ads by Google