Presentation is loading. Please wait.

Presentation is loading. Please wait.

McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 8 Accessing Organizational Information – Data Warehouse.

Similar presentations


Presentation on theme: "McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 8 Accessing Organizational Information – Data Warehouse."— Presentation transcript:

1 McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 8 Accessing Organizational Information – Data Warehouse

2 8-2 Learning Outcomes 8.1 Describe the roles and purposes of data warehouses and data marts in an organization 8.2 Compare the multidimensional nature of data warehouses (and data marts) with the two-dimensional nature of databases

3 8-3 Learning Outcomes 8.3 Identify the importance of ensuring the cleanliness of information throughout an organization 8.4 Explain the relationship between business intelligence and a data warehouse

4 8-4 Intro to Data Warehousing This video must be watched Data warehouse demystified

5 8-5 History of Data Warehousing Data warehouses extend the transformation of data into information In the 1990’s executives became less concerned with the day-to-day business operations and more concerned with overall business functions The data warehouse provided the ability to support decision making without disrupting the day-to-day operations

6 8-6 Data Warehouse Fundamentals Data warehouse – a logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasks The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes

7 8-7 Data Warehouse – Definitions A data warehouse is a type of computer based information system developed to provide an organization with business intelligence to support decision making and to monitor the operations in a company. Integrates data from many different sources and makes it available to end users in a what they can understand and use in a business context in a timely manner.

8 8-8 Primary difference between a database and data warehouse database stores information for a single application, whereas a data warehouse stores information from multiple databases, or multiple applications, and external information such as industry information

9 8-9 Data Warehouse Fundamentals Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse Data mart – contains a subset of data warehouse information. The ETL process also gathers data from the data warehouse and passes it to the data marts

10 8-10 Data Warehouse Fundamentals

11 8-11 Components of a Data Warehouse Metadata means data about data

12 8-12 Multidimensional Analysis and Data Mining Databases contain information in a series of two-dimensional tables In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows

13 8-13 Dimensions Dimensions could include such things as: –Products –Promotions –Stores –Category –Region –Stock price –Date –Time –Weather

14 8-14 slicing-and-dicing –The ability to look at information from different dimensions can add tremendous business insight –By slicing-and-dicing the information a business can uncover great unexpected insights

15 8-15 Multidimensional Analysis and Data Mining Cube – common term for the representation of multidimensional information

16 8-16 Data Warehouses Are Multidimensional A Multidimensional Data Warehouse with Information from Multiple Operational Databases

17 8-17 Data purchased from outside source for Data Warehousing

18 8-18 Data Marts – Smaller Data Warehouses Data mart - a subset of a data warehouse in which only a focused portion of the data warehouse information is kept.

19 8-19 Data Mart Examples

20 8-20 Multidimensional Analysis and Data Mining Data mining – the process of analyzing data to extract information not offered by the raw data alone To perform data mining users need data-mining tools –Data-mining tool – uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making

21 8-21 Data-Mining Systems Process data using statistical techniques like regression analysis and decision tree analysis Look for patterns and relationships to anticipate events or predict outcomes Data-mining tools include query tools, reporting tools, multidimensional analysis tools, statistical tools

22 8-22 Decision Tree Analysis Figure CE14-3

23 8-23 Example of Data Mining Example: Finding correlation of items on past orders to determine items that are frequently purchased together.

24 8-24 Example of Data Mining Several years ago, a large retailer implemented a data warehouse to analyze sales. Loaded huge volumes (Terabytes) of Point of Sale data into the warehouse Built an application, based on specialized ‘data mining’ software to perform ‘market basket analysis’ –What items are purchased with other items in the same transactions

25 8-25 “Diapers and Beer” Noticed some unusual correlations, one was many transactions where beer in same market basket as diapers Analysis identified a ‘micro-segment’ of customer base – young fathers, buying diapers, deciding to get beer at same time Based on information, retailer reorganized diaper aisle – placed beer at end on aisle Beer sales increased.

26 8-26 Information Cleansing or Scrubbing An organization must maintain high-quality data in the data warehouse Information cleansing or scrubbing – a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information

27 8-27 Problems with Operational Data for BI Raw data usually unsuitable for sophisticated reporting or data mining Non-Standard Dirty data ( gender, age, phone #, misspelling, email address) Values may be missing ( gender….) Inconsistent data ( time zone) Data can be too fine (clickstream) or too coarse (totals)

28 8-28 Information Cleansing or Scrubbing Standardizing Customer name from Operational Systems

29 8-29 Standardization of data Consider the following addresses: These should be standardized as: –123 Main, Antigonish, Nova Scotia, B2G 2A1 StreetCityPostal Code 123 MainAntigonish 123 MainB2G 2A1

30 8-30 Cleaning of raw data Cleaning is a process used to remove unwanted ‘noise’ from data and to make data more presentable Consider the following shipping address: –Address: 123 Main St. * deliver to rear entrance * –City: Antigonish –Postal Code: B2G 2A1 –Province: NS –Cleaning process would remove the *comment * in the address field

31 8-31 Information Cleansing or Scrubbing Information cleansing activities

32 8-32 Information Cleansing or Scrubbing Accurate and complete information

33 8-33 Business Intelligence (BI) Systems Provide information for improving decision making hence competitive advantage Primary systems: –Reporting systems –Data-mining systems

34 8-34 What is a report? It’s a program that retrieves data from a database, formats and presents it to a user Can allow filtering, sorting, aggregation A report can be: –displayed online –exported to excel or other software –Emailed as an attachment –printed Data Report program Lists, charts, spreadsheets

35 8-35 Reporting Systems Integrate data from multiple sources Process data by sorting, grouping, summing, averaging, and comparing Results formatted into reports Improve decision making by providing right information to right user at right time

36 8-36 Tools for looking at data Digital Dashboard Instead of paper reports, warehouses give users tools for looking at data differently. They also allow those users to manipulate their data. There are times when a color coded ‘speedometer’ speaks volumes over a simple paper report. An interactive table that allows the user to drill down into detail data with the click of a mouse can answer questions that might take months to answer in a traditional system.


Download ppt "McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 8 Accessing Organizational Information – Data Warehouse."

Similar presentations


Ads by Google