Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 13 The Data Warehouse.

Similar presentations


Presentation on theme: "Chapter 13 The Data Warehouse."— Presentation transcript:

1 Chapter 13 The Data Warehouse

2 The Need for Data Analysis
Managers must be able to track daily transactions to evaluate how the business is performing By tapping into the operational database, management can develop strategies to meet organizational goals Data analysis can provide information about short-term tactical evaluations and strategies

3 Solving Business Problems and Adding Value with Data Warehouse-Based Solutions

4 Solving Business Problems and Adding Value with Data Warehouse-Based Solutions (continued)

5 Decision Support Systems
Methodology (or series of methodologies) designed to extract information from data and to use such information as a basis for decision making Decision support system (DSS): Arrangement of computerized tools used to assist managerial decision making within a business Usually requires extensive data “massaging” to produce information Used at all levels within an organization Often tailored to focus on specific business areas Provides ad hoc query tools to retrieve data and to display data in different formats

6 Decision Support Systems (continued)
Composed of four main components: Data store component Basically a DSS database Data extraction and filtering component Used to extract and validate data taken from operational database and external data sources End-user query tool Used to create queries that access database End-user presentation tool Used to organize and present data

7 Main Components of a Decision Support System (DSS)

8 Transforming Operational Data Into Decision Support Data

9 Contrasting Operational and DSS Data Characteristics

10 The Data Warehouse Integrated, subject-oriented, time-variant, nonvolatile database that provides support for decision making

11 A Comparison of Data Warehouse and Operational Database Characteristics

12 Creating a Data Warehouse

13 Scrub or data cleansing Transform Load and Index
The ETL Process Capture/Extract Scrub or data cleansing Transform Load and Index ETL = Extract, transform, and load

14 Figure 11-10: Steps in data reconciliation
Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Figure 11-10: Steps in data reconciliation Incremental extract = capturing changes that have occurred since the last static extract Static extract = capturing a snapshot of the source data at a point in time

15 Figure 11-10: Steps in data reconciliation
Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality Figure 11-10: Steps in data reconciliation (cont.) Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data

16 Cleanse: Process to identify erroneous data, not to fix them Fixes are made at the source Scrubbing: A technique using pattern recognition and other AI techniques to upgrade the quality of data

17 Figure 11-10: Steps in data reconciliation
Transform = convert data from format of operational system to format of data warehouse Figure 11-10: Steps in data reconciliation (cont.) Record-level: Selection–data partitioning Joining–data combining Aggregation–data summarization Field-level: single-field–from one field to one field multi-field–from many fields to one, or one field to many

18 Figure 11-10: Steps in data reconciliation
Load/Index= place transformed data into the warehouse and create indexes Figure 11-10: Steps in data reconciliation (cont.) Refresh mode: bulk rewriting of target data at periodic intervals Update mode: only changes in source data are written to data warehouse

19 Figure 11-11: Single-field transformation
In general–some transformation function translates data from old form to new form Algorithmic transformation uses a formula or logical expression Table lookup–another approach, uses a separate table keyed by source record code

20 Figure 11-12: Multifield transformation
M:1–from many source fields to one target field 1:M–from one source field to many target fields

21 Derived Data Objectives Characteristics
Ease of use for decision support applications Fast response to predefined user queries Customized data for particular target audiences Ad-hoc query support Data mining capabilities Characteristics Detailed (mostly periodic) data Aggregate (for summary) Distributed (to departmental servers) Most common data model = star schema (also called “dimensional model”)

22 Star Schemas Data modeling technique used to map multidimensional decision support data into a relational database Creates the near equivalent of a multidimensional database schema from the existing relational database Yield an easily implemented model for multidimensional data analysis, while still preserving the relational structures on which the operational database is built Has four components: facts, dimensions, attributes, and attribute hierarchies

23 Figure 11-13 Components of a star schema
Fact tables contain factual or quantitative data 1:N relationship between dimension tables and fact tables Dimension tables are denormalized to maximize performance Dimension tables contain descriptions about the subjects of the business Excellent for ad-hoc queries, but bad for online transaction processing

24 Figure 11-14 Star schema example
Fact table provides statistics for sales broken down by product, period and store dimensions

25 Figure 11-15 Star schema with sample data

26 Size of fact table assume: Total number of stores=1000
Total number of products=10,000 Total number of periods= 24 (two years) Assume 50% of products record sales Then total rows in fact table 1000stores* 5000 active products)*24 months =120,000,000 rows

27 Size of “fact” table Assume there are 6 fields each 4 bytes long, then total size 120,00,000* 6 fields* 4bytes/field =2,880,000,000 (2.88 gigabytes) If instead of monthly, you record daily data Multiply above by 30 (30 days/per month)

28 Online Analytical Processing
Advanced data analysis environment that supports decision making, business modeling, and operations research OLAP systems share four main characteristics: Use multidimensional data analysis techniques Provide advanced database support Provide easy-to-use end-user interfaces Support client/server architecture

29 Operational vs. Multidimensional View of Sales

30 Another example: Simple Star Schema

31 Possible Attributes for Sales Dimensions

32 Three-Dimensional View of Sales

33 Slice and Dice View of Sales

34 Location Attribute Hierarchy

35 Star Schema for Sales

36 Orders Star Schema

37 Normalized Dimension tables

38 Multiple Fact Tables

39 On-Line Analytical Processing (OLAP) Tools
The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques Relational OLAP (ROLAP) Traditional relational representation Multidimensional OLAP (MOLAP) Cube structure OLAP Operations Cube slicing–come up with 2-D view of data Drill-down–going from summary to more detailed views

40 Figure 11-23 Slicing a data cube

41 Figure 11-24 Example of drill-down Summary report
Starting with summary data, users can obtain details for particular cells Drill-down with color added

42 Implementing a Data Warehouse
Numerous constraints: Available funding Management’s view of the role played by an IS department and of the extent and depth of the information requirements Corporate culture No single formula can describe perfect data warehouse development

43 Factors Common to Data Warehousing
Data warehouse is not a static database Dynamic framework for decision support that is always a work in progress Data warehouse data cross departmental lines and geographical boundaries Must satisfy: Data integration and loading criteria Data analysis capabilities with acceptable query performance End-user data analysis needs Apply database design procedures

44 Data Mining Tools that: Require minimal end-user intervention
analyze data uncover problems or opportunities hidden in data relationships, form computer models based on their findings, and then use the models to predict business behavior Require minimal end-user intervention

45 Extraction of Knowledge From Data

46 Data-Mining Phases

47 A Sample of Current Data Warehousing and Data-Mining Vendors

48 Data Mining and Visualization
Knowledge discovery using a blend of statistical, AI, and computer graphics techniques Goals: Explain observed events or conditions Confirm hypotheses Explore data for new or unexpected relationships Techniques Statistical regression Decision tree induction Clustering and signal processing Affinity Sequence association Case-based reasoning Rule discovery Neural nets Fractals Data visualization–representing data in graphical/multimedia formats for analysis

49 Summary Data analysis is used to derive and interpret information from data Decision support is a methodology designed to extract information from data and to use such information as a basis for decision making Decision support system is an arrangement of computerized tools used to assist managerial decision making within a business Data warehouse is an integrated, subject-oriented, time-variant, nonvolatile database that provides support for decision making

50 Summary (continued) Online analytical processing is an advanced data analysis environment that supports decision making, business modeling, and operations research Star schema is a data-modeling technique used to map multidimensional decision support data into a relational database The implementation of any company-wide information system is subject to conflicting organizational and behavioral factors

51 Summary (continued) Data mining automates analysis of operational data with the intention of finding previously unknown data characteristics, relationships, dependencies, and/or trends Data warehouse is storage location for decision support data


Download ppt "Chapter 13 The Data Warehouse."

Similar presentations


Ads by Google