Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2008.

Similar presentations


Presentation on theme: "Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2008."— Presentation transcript:

1 Data Warehouse and Business Intelligence Dr. Minder Chen Minder.Chen@CSUCI.EDU Fall 2008

2 Data Warehouse - 2 © Minder Chen, 2004-2008 Online Resources Additional resources: –Teradata Student Network. »The Premier Learning Resource for Data Warehousing, DSS/BI, and Database. The URL is http://www.teradatastudentnetwork.com http://www.teradatastudentnetwork.com »PSW: smartdecisions

3 Data Warehouse - 3 © Minder Chen, 2004-2008 BI Business Intelligence (BI) is the process of gathering meaningful information to answer questions and identify significant trends or patterns, giving key stakeholders the ability to make better business decisions. “The key in business is to know something that nobody else knows.” -- Aristotle Onassis PHOTO: HULTON-DEUTSCH COLLHULTON-DEUTSCH COLL “To understand is to perceive patterns.” — Sir Isaiah Berlin "The manager asks how and when, the leader asks what and why. " — “ On Becoming a Leader” by Warren Bennis

4 Data Warehouse - 4 © Minder Chen, 2004-2008 BI Questions What happened? –What were our total sales this month? What’s happening? – Are our sales going up or down, trend analysis Why? –Why have sales gone down? What will happen? –Forecasting & “What If” Analysis What do I want to happen? –Planning & Targets Source: Bill Baker, Microsoft

5 Data Warehouse - 5 © Minder Chen, 2004-2008 Increasing potential to support business decisions (MIS) End User Business Analyst Data Analyst DBA Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration OLAP, MDA, Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts Data Sources (Paper, Files, Information Providers, Database Systems, OLTP) Business Intelligence

6 Data Warehouse - 6 © Minder Chen, 2004-2008 Where is Business Intelligence applied? ERP Reporting KPI Tracking Product Profitability Risk Management Balanced Scorecard Activity Based Costing Global Sourcing Logistics Sales Analysis Sales Forecasting Segmentation Cross-selling CRM Analytics Campaign Planning Customer Profitability Operational EfficiencyCustomer Interaction

7 Data Warehouse - 7 © Minder Chen, 2004-2008

8 Data Warehouse - 8 © Minder Chen, 2004-2008 Inmon's Definition of Data Warehouse – Data View A warehouse is a –subject-oriented, –integrated, –time-variant and –non-volatile collection of data in support of management's decision making process. –Bill Inmon in 1990 Source: http://www.intranetjournal.com/features/datawarehousing.html

9 Data Warehouse - 9 © Minder Chen, 2004-2008 Inmon's Definition Explain Subject-oriented: They are organized around major subjects such as customer, supplier, product, and sales. Data warehouses focus on modeling and analysis to support planning and management decisions v.s. operations and transaction processing. Integrated: Data warehouses involve an integration of sources such as relational databases, flat files, and on- line transaction records. Processes such as data cleansing and data scrubbing achieve data consistency in naming conventions, encoding structures, and attribute measures. Time-variant: Data contained in the warehouse provide information from an historical perspective. Nonvolatile: Data contained in the warehouse are physically separate from data present in the operational environment.

10 Data Warehouse - 10 © Minder Chen, 2004-2008 Kimball's Definition – Process View A data warehouse is a system that extracts, cleans, conforms, and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making. »Ralph Kimball

11 Data Warehouse - 11 © Minder Chen, 2004-2008

12 Data Warehouse - 12 © Minder Chen, 2004-2008 The Data Warehouse Process Data Marts and cubes DataWarehouse SourceSystems Clients Design the PopulateCreateQuery Data Warehouse Data WarehouseOLAP CubesData Design the PopulateCreateQuery Data Warehouse Data WarehouseOLAP CubesData 34 Query Tools ReportingAnalysis Data Mining 21

13 Data Warehouse - 13 © Minder Chen, 2004-2008 Key Concepts in BI Development Lifecycle

14 Data Warehouse - 14 © Minder Chen, 2004-2008 Business Valuation Models for BI

15 Data Warehouse - 15 © Minder Chen, 2004-2008 Performance Dashboards for Information Delivery

16 Data Warehouse - 16 © Minder Chen, 2004-2008 Scorecards for Information Delivery

17 Data Warehouse - 17 © Minder Chen, 2004-2008 OLTP Normalized Design Ordering Process Ware- house POS Process Chain Retailer Retailer Returns Retailer Payments Store Product Brand GL Account Clerk Retail Cust Cash Register Retail Promo

18 Data Warehouse - 18 © Minder Chen, 2004-2008 OLTP Versus Business Intelligence: Who asks what? OLTP Questions When did that order ship? How many units are in inventory? Does this customer have unpaid bills? Are any of customer X’s line items on backorder? Analysis Questions What factors affect order processing time? How did each product line (or product) contribute to profit last quarter? Which products have the lowest Gross Margin? What is the value of items on backorder, and is it trending up or down over time?

19 Data Warehouse - 19 © Minder Chen, 2004-2008 OLTP vs. OLAP Source: http://www.rainmakerworks.com/pdfdocs/OLTP_vs_OLAP.pdf#search=%22OLTP%20vs.%20OLAP%22http://www.rainmakerworks.com/pdfdocs/OLTP_vs_OLAP.pdf#search=%22OLTP%20vs.%20OLAP%22

20 Data Warehouse - 20 © Minder Chen, 2004-2008 Dimensional Design Process Select the business process to model Declare the grain of the business process/data in the fact table Choose the dimensions that apply to each fact table row Identify the numeric facts that will populate each fact table row Business Requirements Data Realities

21 Data Warehouse - 21 © Minder Chen, 2004-2008 Select a business process to model Not business departments or business functions Cross-functional business processes Business events Examples: –Raw materials purchasing –Order fulfillment process –Shipments –Invoicing –Inventory –General ledger

22 Data Warehouse - 22 © Minder Chen, 2004-2008 Requirements

23 Data Warehouse - 23 © Minder Chen, 2004-2008 Identifying Measures and Dimensions The attribute varies continuously: Balance Unit Sold Cost Sales The attribute is perceived as a constant or discrete value: Description Location Color Size DimensionsMeasures Performance Measures for KPI Performance Drivers

24 Data Warehouse - 24 © Minder Chen, 2004-2008 A Dimensional Model for a Grocery Store Sales

25 Data Warehouse - 25 © Minder Chen, 2004-2008 Product Dimension SKU: Stock Keeping Unit Hierarchy: –Department  Category  Subcategory  Brand  Product

26 Data Warehouse - 26 © Minder Chen, 2004-2008 Creating Dimensional Model Identify fact tables Translate business measures into fact tables Analyze source system information for additional measures Identify base and derived measures Document additivity of measures Identify dimension tables Link fact tables to the dimension tables Create views for users

27 Data Warehouse - 27 © Minder Chen, 2004-2008 Transaction Level Order Item Fact Table

28 Data Warehouse - 28 © Minder Chen, 2004-2008 Inside a Dimension Table Dimension table key: Uniquely identify each row. Use surrogate key (integer). Table is wide: A table may have many attributes (columns). Textual attributes. Descriptive attributes in string format. No numerical values for calculation. Attributes not directly related: E.g., product color and product package size. No transitive dependency. Not normalized (star schemar). Drilling down and rolling up along a dimension. One or more hierarchy within a dimension. Fewer number of records.

29 Data Warehouse - 29 © Minder Chen, 2004-2008 Fact Tables Fact tables have the following characteristics: Contain numeric measures (metric) of the business May contain summarized (aggregated) data May contain date-stamped data Are typically additive Have key value that is typically a concatenated key composed of the primary keys of the dimensions Joined to dimension tables through foreign keys that reference primary keys in the dimension tables

30 Data Warehouse - 30 © Minder Chen, 2004-2008 Facts Table DateID ProductID CustomerID Units Dollars Dimensions Measures The Fact Table contains keys and units of measure Measurements of business events.

31 Data Warehouse - 31 © Minder Chen, 2004-2008 Snowflake SchemaSales Customers Dates Products Channels Promotions Brands

32 Data Warehouse - 32 © Minder Chen, 2004-2008 Hierarchy

33 Data Warehouse - 33 © Minder Chen, 2004-2008 OLAP Solutions Data Warehouse/Data Mart Dimensions Measures Cubes Cells Gadgets Gizmos Thingies Widgets Q1 Q2 Q3 Q4 US Europe Asia 130 135 140 142 205 390 350 475 175 230 190 250 310 340 410 450

34 Data Warehouse - 34 © Minder Chen, 2004-2008 Operations in Multidimensional Data Model Aggregation (roll-up) –dimension reduction: e.g., total sales by city –summarization over aggregate hierarchy: e.g., total sales by city and year  total sales by region and by year Selection (slice) defines a subcube –e.g., sales where city = Palo Alto and date = 1/15/96 Navigation to detailed data (drill-down) –e.g., (sales - expense) by city, top 3% of cities by average income Visualization Operations (e.g., Pivot)

35 Data Warehouse - 35 © Minder Chen, 2004-2008 A Visual Operation: Pivot (Rotate)10 47 30 12 12 JuiceColaMilkCream NYLASF 3/1 3/2 3/3 3/4 Date Month Region Product

36 Data Warehouse - 36 © Minder Chen, 2004-2008 Date Dimension of the Retail Sales Model

37 Data Warehouse - 37 © Minder Chen, 2004-2008 Store Dimension It is not uncommon to represent multiple hierarchies in a dimension table. Ideally, the attribute names and values should be unique across the multiple hierarchies.

38 Data Warehouse - 38 © Minder Chen, 2004-2008 Multidimensional Query Techniques What? Why? Slicing Dicing Drilling down Product Time Geography

39 Data Warehouse - 39 © Minder Chen, 2004-2008 ETL ETL = Extract, Transform, Load Moving data from production systems to DW Checking data integrity Assigning surrogate key values Collecting data from disparate systems Reorganizing data

40 Data Warehouse - 40 © Minder Chen, 2004-2008 Pivot Table in Excel

41 Data Warehouse - 41 © Minder Chen, 2004-2008 Data Quality Issues No common time basis Different calculation algorithms Different levels of extraction Different levels of granularity Different data field names Different data field meanings Missing information No data correction rules No drill-down capability

42 Data Warehouse - 42 © Minder Chen, 2004-2008 Building The Warehouse Transforming Data

43 Data Warehouse - 43 © Minder Chen, 2004-2008 CUST # NAMEADDRESSTYPE 90238475 90233479 90233489 90234889 90345672 90328574 90328575 Digital Equipment Digital Digital Corp Digital Consulting Digital Info Service Digital Integration DEC 187 N. PARK St. Salem NH 01458 187 N. Pk. St. Salem NH 01458 187 N. Park St Salem NH 01458 187 N. Park Ave. Salem NH 01458 15 Main Street Andover MA 02341 PO Box 9 Boston MA 02210 Park Blvd. Boston MA 04106 OEM $#% Comp Consult Mail List SYS INT No Unique Key Noise in Blank Fields Spelling No StandardizationAnomalies How does one correctly identify and consolidate anomalies from millions of records? The Anomalies Nightmare

44 Data Warehouse - 44 © Minder Chen, 2004-2008 OLAP and Data Mining Address Different Types of Questions While reporting and OLAP are informative about past facts, only data mining can help you predict the future of your business. OLAP Data Mining What was the response rate to our mailing? What is the profile of people who are likely to respond to future mailings? How many units of our new product did we sell to our existing customers? Which existing customers are likely to buy our next new product? Who were my 10 best customers last year? Which 10 customers offer me the greatest profit potential? Which customers didn't renew their policies last month? Which customers are likely to switch to the competition in the next six months? Which customers defaulted on their loans? Is this customer likely to be a good credit risk? What were sales by region last quarter? What are expected sales by region next year? What percentage of the parts we produced yesterday are defective? What can I do to improve throughput and reduce scrap? Source: http://www.dmreview.com/editorial/dmreview/print_action.cfm?articleId=2367

45 Data Warehouse - 45 © Minder Chen, 2004-2008 Use of Data Mining Customer profiling Market segmentation Buying pattern affinities Database marketing Credit scoring and risk analysis

46 Data Warehouse - 46 © Minder Chen, 2004-2008 Associates Which items are purchased in a retail store at the same time?

47 Data Warehouse - 47 © Minder Chen, 2004-2008 Sequential Patterns What is the likelihood that a customer will buy a product next month, if he buys a related item today?

48 Data Warehouse - 48 © Minder Chen, 2004-2008 Classifications Determine customers’ buying patterns and then find other customers with similar attributes that may be targeted for a marketing campaign.

49 Data Warehouse - 49 © Minder Chen, 2004-2008 Modeling Use factors, such as location, number of bedrooms, and square footage, to Determine the market value of a property


Download ppt "Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2008."

Similar presentations


Ads by Google