Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.

Similar presentations


Presentation on theme: "Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1."— Presentation transcript:

1 Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1

2 Introduction: n Definitions –Legacy Systems –Dimensions –Data Dependencies Model –Dimensional Model

3 An ER Model Ship Type Shipper District Credit Order Item Ship To Product Contact Locat. Product Line Sales Order Cust. Locat. Product Group ContractContract Type Customer Sales Rep Sales District Sales Region Sales Division Contact

4 A Dimensional Model Product Market Time

5 Why Data Warehouses? n To meet the long sought after goal of providing the user with more flexible data bases containing data that can be accessed “every which way.”

6 OLTP vs. OLAP n OLTP (Online transaction processing) has been the standard reason for IS and DP for the last thirty years. Most legacy systems are quite good at capturing data but do not facilitate data access. n OLAP (Online analytical processing) is a set of procedures for defining and using a dimension framework for decision support

7 The Goals for and Characteristics of a DW n Make organizational data accessible n Facilitate consistency n Adaptable and yet resilient to change n Secure and reliable n Designed with a focus on supporting decision making

8 The Goals for and Characteristics of a DW n Generate an environment in which data can be sliced and diced in multiple ways n It is more than data, it is a set of tools to query, analyze, and present information n The DW is the place where operational data is published (cleaned up, assembled, etc.)

9 Basic elements of the data warehouse Services: Clean, combine, and standardize Conform Dimensions No user query services Data Store: Flat files and relational tables Processing: Sorting and sequential processing Data Staging Area Data Mart #1 Dimensional Atomic and summary data Based on a single business process Data Mart #2 Similar design DW Bus: Conformed facts and dimensions Ad hoc query tools Report Writers Analytical Applications Modeling: Forecasting Scoring Data Mining Extract Load Access Operational Source Systems Data Presentation Area Data Access Tools

10 Data Staging Area n Extract-Transformation-Load –Extract: Reading the source data and copying the data to the staging area –Transformation: Cleaning Combining Duplicating Assigning keys –Load: present data to the bulk loading facilities of the data mart

11 Organization of data in the presentation area of the data warehouse n Data in the warehouse are dimensional, not normalized relations –However, data that are ultimately presented in the data warehouse will often be derived directly from relational DBs n Data should be atomic someplace in the warehouse; even if the presentation is aggregate n Uses the bus architecture to support a decentralized set of data marts

12 Updates to a data warehouse n For many years, the dogma stated that data warehouses are never updated. n This is unrealistic since labels, titles, etc. change. n Some components will, therefore, be changed; albeit, via a managed load (as opposed to transactional updates)

13 Dimensional Modeling Terms and Concepts n Fact table n Dimension tables

14 Fact Tables n Fact table: a table in the data warehouse that contains –Numerical performance measures –Foreign keys that tie the fact table to the dimension tables

15 Fact Tables n Each row records a measurement describing a transaction –Where? –When? –Who? –How much? –How many? n The level of detail represented by this data is referred to as the grain of the data warehouse –Questions can only be asked down to a level corresponding with the grain of the data warehouse

16 Fact Tables n Fact tables contain numeric data that can be one of three types –Additive –Semi-additive –Non-additive n Fact tables contain foreign keys –A group of foreign keys will be used to create a concatenated primary key n Fact tables generally don’t contain textual data

17 Dimension tables n Tables containing textual descriptors of the business –Dimension tables are usually wide (e.g., 100 columns) –Dimension tables are usually shallow (100s of thousand or a few million rows) –Values in the dimensions usually provide Constraints on queries (e.g., view customer by region) Report headings

18 Dimension tables n The quality of the dimensions will determine the quality of the data warehouse; that is, the DW is only as good as its dimension attributes n Dimensions are often split into hierarchical branches (i.e., snowflakes) because of the hierarchical nature of organizations –Product part  Product  Brand n Dimensions are usually highly denormalized

19 Dimension tables n The dimension attributes define the constraints for the DW. Without good dimensions, it becomes difficult to narrow down on a solution when the DW is used for decision support

20 Bringing together facts and dimensions – Building the dimensional Model n Start with the normalized ER Model n Group the ER diagram components into segments based on common business processes and model each as a unit n Find M:M relationships in the model with numeric and additive non-key facts and include them in a fact table n Denormalize the other tables as needed and designate one field as a primary key

21 A Dimensional Model time_key day_of_Week month quarter year holiday_flag time_key product_key store_key dollars_sold units_sold dollars_cost product_key description brand category store_key store_name address floor_plan_type Time Dimension Sales Fact Product Dimension Store Dimension

22 So, What is a DW? n A data warehouse is a subject-oriented, integrated, non-volatile, and time-variant collection of data in support of management’s decisions W.H. Inmon (the father of DW)

23 Subject Oriented n Data in a data warehouse are organized around the major subjects of the organization

24 Integrated n Data from multiple sources are standardized (scrubbed, cleansed, etc.) and brought into one environment

25 Non-Volatile n Once added to the DW, data are not changed (barring the existence of major errors)

26 Time Variant n The DW captures data at a specific moment, thus, it is a snap-shot view of the organization at that moment in time. As these snap-shots accumulate, the analyst is able to examine the organization over time (a time series!) n The snap-shot is called a production data extract


Download ppt "Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1."

Similar presentations


Ads by Google