Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Business Intelligence & Data Warehousing Tom A. Fürstenberg Business Intelligence Consultant Cap Gemini Ernst & Young.

Similar presentations


Presentation on theme: "1 Business Intelligence & Data Warehousing Tom A. Fürstenberg Business Intelligence Consultant Cap Gemini Ernst & Young."— Presentation transcript:

1 1 Business Intelligence & Data Warehousing Tom A. Fürstenberg Business Intelligence Consultant Cap Gemini Ernst & Young

2 2 Leerdoelen college Wat is BI & DWH? (Conceptueel en Technisch) Toepassing van BI & DWH De praktijk van een consultant iha en bij Cap Gemini Ernst & Young ihb

3 3 Inhoud College Performance Management Business Intelligence (Performance Measurement) OLAP Extranets Architectuur Data Warehouse ETL Multidimensioneel Modelleren CGE&Y Aanpak Data Mining

4 4 Performance Management Doelgericht meten en bijsturen van bedrijfsdoelstellingen

5 5 In control of a company

6 6Overview Conceptueel Besturingsmodel Strategie & Missie Verantwoordelijkheden & Bevoegdheden Operationeel Besturingsmodel Besturings- systematiek Doelen Middelen Rand- voorwaarden Key Performance Indicators Critical Succes Indicators External Indicators Informatie- model Informatie- systeem Informatie- voorziening Datawarehouse data

7 7 Besturings visie: Bouwstenen voor besturing van organisaties Waarom ? Stake holders Wie? Doelstellingen & Prestatie indicatoren Wat? Methoden Systemen Hoe? Waarden & normen Strategie &Missie Organisatie

8 8Methoden Balanced Scorecard INK managementmodel Diverse Financiële modellen

9 9 Naar een operationeel Besturingsmodel KPI Na het vaststellen van de Doelen en KPI’s CSI Worden de Critische Succes Indicatoren bepaald OI Gevolgd door het vaststellen van de Omgevings Indicatoren

10 10 Naar een operationeel Besturingsmodel KPI CSI OI Tijd Regio Product Afdeling Markt

11 11 Van Model naar Gedragsverandering Management Charter Operationeel Besturingsmodel Beoordeling en Sturing Informatievoorziening Planning en Commitment Multi-dimensionale Gegevensstructuur Verantwoordelijkheden en Bevoegdheden

12 12 Some Typical Mgt. Questions PRODUCTCUSTOMER CHANNELMARKETING How much have we sold? Which product gives the best profit? Which product has the largest sales volume this quarter? Which product best meets market needs? How much to produce of each product? Who is the most profitable customer? What is the satisfaction level? Which are the best segments? Which service to improve? How many customers have we lost last year? Who are our biggest accounts? Which retailer yields most by volume and which by profit? What promotions will yield most profit? What effect will discounts have on the turnover? What are the area coverage levels? How many contacted people became a customer? Promotions’ results? What is the competition doing?

13 13 Source: Results FIND! The Best benchmarkstudy conducted in 1997/1998 by Ernst & Young Consulting and VU. 103 industrial companies participated in the study. Key Performance Indicators Top 10

14 14 En nu alleen nog even meten… Business Intelligence (performance measurement)

15 15 The Answers The information is there, but spread everywhere!

16 16 De praktijk...

17 17 Problemen (Over)belasting IT-afdeling (queries) Lange doorlooptijd rapport-’fabricage’ Hoge kosten aan manuren Databronnen moeilijk integreerbaar Niet-gestandaardiseerde rapporten Geen eenduidige definities Foutgevoelig Manipuleerbaar Afhankelijkheid van ‘schakels’ Discussies over verschillen in cijfers Beperkte analyse-mogelijkheden Verkeerde en te late interpretaties, conclusies, beslissingen...

18 18 Een druk op de knop...

19 19 Van chaos... Naar structuur

20 20 Why now? Hype? Developments: Globalisation of markets Individualisation of customers Shorter life cycle of products Information overload Mergers Faster hardware Cheaper disk capacity Modern OLAP-tools Any access: c/s, web, mobile Market Pull Technology Push

21 21 OnLine Analytical Processing Gebaseerd op de syntax van management- informatie vragen: per per per... KPI’s, CSI’s en OI’s zijn meetwaarden Produkt, Regio, Klant, Tijd, etc. zijn dimensies (slice & dice) Dimensies kennen hierachiën (drill down)

22 22 Product Manager’s View Financial Manager’s View Regional Manager’s View Ad Hoc View Product TimeOLAP

23 23 Q4 Time Q1Q2Q3 Product Grapes Apples Melons Cherries Pears Location Atlanta Denver Detroit Sales Introduction to Cubes Introduction to Cubes Product Grapes Apples Melons Cherries Pears Product Grapes Apples Melons Cherries Pears Location Atlanta Denver Detroit Sales

24 24Demo eFashion Case BusinessObjects Demo

25 25 BusinessObjects: Semantic Layer

26 26 Any Access

27 27 Info- & Analysis-need at 3rd parties

28 28 e/m-Business Intelligence: Extranets CUSTOMERS PARTNERS SUPPLIERS extra net extr anet Data Warehouse

29 29 Extranet demo’s Extranet demo’s

30 30 Business Intelligence Theory

31 31 BI Definition Business Intelligence is the process of collection, cleansing, combining, consolidation, analysis, interpretation and communication of all internal and available external data, relevant for the decision making process in the organisation

32 32 BI Concept Data Information Knowledge Action Collection Decisions Integration Analysis Feedback Business Value

33 33 BI Systems Reporting & Query DSS, MIS and EIS OLAP Data Mining

34 34 mining exploring Number of users Static Dynamic analysis reporting querying Complexity of the question The Five Functional Levels standard reports ‘bunch of reports’, ‘cube’ unique ‘report’ or question i.e. finding variables i.e stat. analysis, testing a hypothesis

35 35 mining exploring Number of users Static/ Dynamic interactief analysis reporting querying Complexity of the question 80 % of all users The Five Functional Levels

36 36 Applications Any SourceAny AccessAny Data LOADMANAGEMENTLOADMANAGEMENT QUERYMANAGEMENTQUERYMANAGEMENT External data Data Marts Data Warehouse Operational Data Store Corporate Information Factory LAN/ WAN WWW

37 37 Components of the CIF Data Warehouse Data Mart Operational Data Store ETL

38 38 Data Warehouse

39 39 Definition Bill Inmon Characteristics of a data warehouse: Subject-oriented Integrated Time-variant Non-volatile Both summary and detailed data

40 40 Data Warehouse Contains data that can be used to meet the information of (part of) the organisation Contains integrated data extracted from one or more sources Mostly contains large amounts of data Contains data that is clean and consistent May contain aggregated data Optimised for its use

41 41 Data Warehouse Data Base Data Warehouse ActualHistorical Internal Internal and External Isolated Integrated Integrated Transactions Analysis NormalisedDimensional Dirty Clean and Consistent Detailed Detailed and Summary

42 42 Data Warehouse Advantages One point of contact Time savings No loss of historical data OLTP’s not hampered by BI activities Better consistency and quality of data Improvement of Business Intelligence

43 43 Data Warehouse Disadvantages Never quite up-to-date Requires a lot of storage space Requires a lot of communication, coordination and cooperation Large impact on the organisation A data warehouse is only the beginning

44 44 Data Mart DW design does not optimise query performance Data is not stored in an optimal fashion for any given department in the DW Competition to get the resources required to get inside the DW Costs for DSS computing facilities are high because of the large volume in DW

45 45 Data Mart Characteristics: Customised for a specific department Limited amount of history Summarised Very flexible Elegant presentation Processor dedicated to the department

46 46 Data Mart Divided by: Business Geography Security Political (budget) Structure (data mining)

47 47 Data Mart Three different kinds of data marts: Subset/summary MOLAP ROLAP

48 48 Operational Data Store Characteristics: Subject-oriented Integrated Current-valued Volatile Detailed data

49 49 ETL: Extraction Source selection: Data model is starting point: determine data elements that are needed For each data element, determine available data sources If more han 1 source available, select on: –Quality, reliability and integrity –Scope of data –Location and availability of data –Location and availability of expertise

50 50 ETL: Transformation Processing: Aggregate records Encoding structures Simple reformatting Mathematical conversion Resequencing of data Default values Key conversion Cleansing

51 51 ETL: Transformation Key structure A Key structure B Key structure C Key structure A Key structure B Key structure C New key structure Key transformation

52 52 ETL: Cleansing Data quality is critical for: Marketing communications Targeted marketing Customer matching Retail- and commercial householding Combining information Tracking retail sales

53 53 ETL: Cleansing Common excuses for not cleaning: The data in the operational systems seem to work just fine Data can be joined most of the time Cleansing will take place after population of the data warehouse Data entry will be improved The users will never agree to change their data

54 54 Multi Dimensional Data Modeling

55 55 MD Modeling: Contents E/R Modeling (Ex.) MD Modeling (Ex.) Star Schema Slowly Changing Dimensions (Ex.) Surrogate Keys Aggregation (Ex.) Measures & Dimensions reviewed Other important MDM aspects

56 56 Exercise: E/R Modeling How could the sales transaction database of the eFashion retailer look like? Ticket Ticket_nr Store_nr Card-nr Employee_nr Time_Stamp Loyalty Card Card_nr Cust_name Adress Zip_code City... Employee Employee_nr Emp_name... Products Sold Ticket_nr Product_nr #_products price dicount Products Product_nr Bar_Code Prod_Desc Actual_price Weight... Store Store_nr Store_name Adress Zip_code City State Manager...

57 57 Management Questions Give me the annual revenue of all my product lines divided over all the sales regions over the last 3 years Give me the top 10 of most profitable products this year Give me the top 10 of most sold products of last year Give me the top 10 of most profitable customers Compare the YTD revenue with the one in the same period last year and the target

58 58 Why not E/R Modeling? End users cannot understand, remember, navigate an E/R model (not even with a GUI) Software cannot usefully query an E/R model Use of E/R modeling doesn’t meet the DW purpose: intuitive and high performance querying

59 59 Exercise: Model the Efashion DM Sales Revenue Time hierarchy (Year-Quarter-Month) Store hierarchy (Region, State, City, Store) Product hierarchy (Line, Category, SKU)

60 60 eFashion Data Mart Facts Month_nr Store_nr SKU_nr Sales_revenue... Product SKU_nr SKU_desc Category Line Time Month_nr Month_desc Quarter Year Geography Store_nr Store_name City State Region

61 61 DW Modeling Components Geographic Product Time Units $ $ Dimension Tables GeographicGeographic ProductProduct TimeTime Fact Table Measures FactsFacts DimensionDimension

62 62 Using a Star Schema Fact Table Dimension Table Time_DimTime_Dim TimeKey TheDate. TheDate. Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey TimeKey EmployeeKey ProductKey CustomerKey ShipperKey $...$... $...$...Employee_DimEmployee_Dim EmployeeKey EmployeeID. EmployeeID. Product_DimProduct_Dim ProductKey ProductID. ProductID. Customer_DimCustomer_Dim CustomerKey CustomerID. CustomerID. Shipper_DimShipper_Dim ShipperKey ShipperID. ShipperID.

63 63 Components of a Star Schema Employee_DimEmployee_Dim EmployeeKey EmployeeID. EmployeeID. EmployeeKeyTime_DimTime_Dim TimeKey TheDate. TheDate. TimeKeyProduct_DimProduct_Dim ProductKey ProductID. ProductID. ProductKeyCustomer_DimCustomer_Dim CustomerKey CustomerID. CustomerID. CustomerKeyShipper_DimShipper_Dim ShipperKey ShipperID. ShipperID. ShipperKey Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey TimeKey EmployeeKey ProductKey CustomerKey ShipperKey $...$... $...$... TimeKey CustomerKey ShipperKey ProductKey EmployeeKey Multipart Key MeasuresMeasures Dimensional Keys

64 64 Exercise: Slowly Changing Dimensions Suppose the product categories change from time to time. Model the Data Mart when the manager wants to see historical reports against: 1. The present categories 2. The categories at the time of the sale 3. Both against the present categories and the immediate previous categories 4. The categories at any specified time

65 65 SCD Exercise 1 Facts Month_nr Store_nr SKU_nr Sales_revenue... Product SKU_nr SKU_desc Category Line Time Month_nr Month_desc Quarter Year Geography Store_nr Store_name City State Region

66 66 SCD Exercise 2 Facts Month_nr Store_nr Product_key Sales_revenue... Product Product_key SKU_nr SKU_desc Category Line Time Month_nr Month_desc Quarter Year Geography Store_nr Store_name City State Region Most Recent Product Key Map Product_key SKU_nr

67 67 SCD Exercise 3 Facts Month_nr Store_nr Product_key Sales_revenue... Product Product_key SKU_nr SKU_desc Category Category_old Line Time Month_nr Month_desc Quarter Year Geography Store_nr Store_name City State Region

68 68 SCD Exercise 4 Facts Month_nr Store_nr SKU_nr Sales_revenue... Product SKU_nr SKU_desc Category Line Valid_from Valid_until Time Month_nr Month_desc Quarter Year Geography Store_nr Store_name City State Region

69 69 Slowly Changing Dimensions Type 1: Overwrite the dimension record Type 2: Create new dimension record Type 3: Create an ‘old’ field in the dimension record Type 4: Add a valid_from and valid_until field in the dimension record Ad. Type 2: requires surrogate keys, but in general, one should always use these because of performance and flexibility Ad. Type 4: Kimball only recognizes 3 types SCD’s

70 70 Always Use Surrogate Keys Allows DWH to assign new key versions for SCD’s (type 2) Higher performance with numeric keys than with long, alphanumeric keys

71 71 Exercise: Aggregation Suppose the manager queries frequently on product line level and finds the performance too low. Question: How to model the data mart when we want to add aggregated measures on product line level?

72 72 Exercise: Aggregation Facts Month_nr Store_nr Product_key Sales_revenue... Product Product_key SKU_nr SKU_desc Category Line Time Month_nr Month_desc Quarter Year Geography Store_nr Store_name City State Region Aggregated Facts Week_nr Store_nr Line_key Sales_revenue... Product_Line Line_key Line

73 73 Exercise: Measures Stock Quantity Product Price Promotion Costs (product-specific, store- independent) Add the following measures to the eFashion Data Mart:

74 74 Exercise: Measures Facts Month_nr Store_nr Product_key Sales_revenue Stock_qty Product Product_key SKU_nr SKU_desc Price Category Line (Valid_from Valid_until) Time Month_nr Month_desc Quarter Year Geography Store_nr Store_name City State Region Promotion Facts Month_nr SKU_nr Promotion_cost Duration Promotion_type... Q_Stock Facts Quarter Store_nr SKU_nr Stock_qty (av, eom) Month Quarter Year

75 75 Measures & Dimensions reviewed Numeric Additive The most useful measures are The natural entry points of the facts I.e., used for constraints and report breaks Independent of each other, not hierarchically related Dimensions are:

76 76 Other Important MDM-Aspects Cardinality Grain Referential Integrity Conformed Dimensions Drill Across Traps

77 77 Applications Any SourceAny AccessAny Data LOADMANAGEMENTLOADMANAGEMENT QUERYMANAGEMENTQUERYMANAGEMENT External data Data Marts Data Warehouse Operational Data Store How to make the CIF? LAN/ WAN WWW

78 78 CGE&Y BI-Approach Overview Strategy & objectives DW blueprint Source data Metamodel Data Warehouse Architecture I Definition Increments Extraction, Transformation Load Development Implementation Incremental Delivery Evolutionary Strategy Data Warehouse Architecture II Awareness Project ManagementCommunication

79 79

80 80 Data Mining

81 81 Data Mining Definition: The process of digging intelligently into large volumes of data to discover and analyse previously unknown relationships or to validate hypotheses.

82 82 Data Mining Versus OLAP OLAP/Query Are there some customers from large accounts with a high decrease in international calls? Data Mining Are there any common characteristics among these customers? Data Information

83 83Applications Risk Analysis (grant credit, investment) Fraud Detection (telephone charge, bank withdrawals) Trouble Shooting and Diagnosis Process Controls (wafer fabrication) Promotion Analysis Bankruptcy Prediction (mortgage lending, business partners) Customer Churn (telco) CRM (next slides)

84 84 Maximizing Customer Value Getting more prospects in Turning prospects into customers Selling more products to existing customers Getting less customers out

85 Which ones in and which ones out? HighestLowest Yield per customer Costs per customer Customer profitability Migration Keep Growth Out Yield per individual customer

86 86 Example: One to One Marketing Treat different customers differently –differentiate message –differentiate product offer –differentiate channel Need for usable information => predict customer behavior out of databases

87

88

89 89

90 90

91 91

92 92

93 93 Example: clickstream analysis What parts of our Web site get the most visitors? What parts of the Web site do we associate most frequently with actual sales? What parts of the Web site are superfluous or visited infrequently? Which pages on our Web site seem to be "session killers," where the remote user stops the session and leaves? What is the new-visitor click profile on our site? What is the click profile of an existing customer? A profitable customer? A complaining customer that all too frequently returns our product? What is the click profile of a customer about to cancel our service, complain, or sue us? How can we induce the customer to register with our site so we learn some useful information about that customer? How many visits do unregistered customers typically make with us before they are willing to register? Before they buy a product or service?

94 94 Tele-sales Service desk Customized Customer Service

95 95 Customer Data Tele-sales Direct Mail Sales visit Good Bad Example: Contact Strategy Channel optimisation Data mining

96 96 Organisation! The customer choses the channel Complaint handling Leaflet receipt Status service question Status order Operational systems Integration Analysis Service question CC App. Leaflet request Contact Order Complaint

97 97 Data Sources for Data Mining DATA Collecting & Cleansing Transactions (loyalty cards) Behaviour of existing customers Logfiles & cookies Market research Data suppliers Public data

98 98 Example: Affinity Grouping Market Basket: what items are sold together? Market Basket: what categories are sold with what items? Market Basket: what is not sold with certain items? Event Correlations: what other services are brought in the first month after signing up for a satellite TV subscription?

99 99 Data Mining Techniques Decision Trees, Classification Trees, Rule Induction Neural Nets Visualisation Fuzzy Logic; Nearest Neighbour; Memory Based Reasoning; Case Based Reasoning Proprietary Logic Classical Statistics

100 100 Data Mining Techniques  Statistical analysis  Neural networks  Genetic algorithms  Decision trees   Intuïtion Predictive Power Simplicity     

101 101 Critical Success Factors Data availability (large amounts of a wide variety of data) Data consistency Data quality Domain expertise Data used/needed is allowed by privacy laws

102 102Benefits Improved customer relationships More revenue from existing customers Market segmentation Differentiated products and services Differentiated sales channels More effective marketing programs Improved fraud detection Improved investments …

103 103 Decision Tree with BusinessMiner from BusinessObjects Demo

104 104 Contact information Tom A. Fürstenberg Business Intelligence Consultant Cap Gemini Ernst & Young Sector Energy, Products & Transport Tel


Download ppt "1 Business Intelligence & Data Warehousing Tom A. Fürstenberg Business Intelligence Consultant Cap Gemini Ernst & Young."

Similar presentations


Ads by Google