Presentation is loading. Please wait.

Presentation is loading. Please wait.

Business Intelligence & Data Warehousing

Similar presentations


Presentation on theme: "Business Intelligence & Data Warehousing"— Presentation transcript:

1 Business Intelligence & Data Warehousing
Tom A. Fürstenberg Business Intelligence Consultant Cap Gemini Ernst & Young 1

2 Leerdoelen college Wat is BI & DWH? (Conceptueel en Technisch)
Toepassing van BI & DWH De praktijk van een consultant iha en bij Cap Gemini Ernst & Young ihb 2

3 Inhoud College Performance Management
Business Intelligence (Performance Measurement) OLAP Extranets Architectuur Data Warehouse ETL Multidimensioneel Modelleren CGE&Y Aanpak Data Mining 3

4 Performance Management
Doelgericht meten en bijsturen van bedrijfsdoelstellingen 4

5 In control of a company 5

6 Overview Datawarehouse Conceptueel Besturingsmodel Strategie & Missie
Verantwoordelijkheden & Bevoegdheden Operationeel Besturingsmodel Besturings-systematiek Doelen Middelen Rand- voorwaarden Key Performance Indicators Critical Succes Indicators External Indicators Informatie-model Informatie-systeem Informatie-voorziening Datawarehouse data 6

7 Besturings visie: Bouwstenen voor besturing van organisaties
Wie? Methoden Systemen Hoe? Doelstellingen & Prestatie indicatoren Wat? Waar gaan wij het over hebben ? (zie overvloeier) Strategie &Missie Stake holders Waarden & normen Waarom ? 7

8 Methoden Diverse Financiële modellen Balanced Scorecard
INK managementmodel 8

9 Naar een operationeel Besturingsmodel
CSI Worden de Critische Succes Indicatoren bepaald KPI Na het vaststellen van de Doelen en KPI’s OI Gevolgd door het vaststellen van de Omgevings Indicatoren 9

10 Naar een operationeel Besturingsmodel
KPI CSI OI Tijd Regio Product Afdeling Markt 10

11 Van Model naar Gedragsverandering
Verantwoordelijkheden en Bevoegdheden Multi-dimensionale Gegevensstructuur Operationeel Besturingsmodel Management Charter Informatievoorziening Planning en Commitment Beoordeling en Sturing 11

12 Some Typical Mgt. Questions
PRODUCT CUSTOMER How much have we sold? Which product gives the best profit? Which product has the largest sales volume this quarter? Which product best meets market needs? How much to produce of each product? Who is the most profitable customer? What is the satisfaction level? Which are the best segments? Which service to improve? How many customers have we lost last year? Who are our biggest accounts? CHANNEL MARKETING Which retailer yields most by volume and which by profit? What promotions will yield most profit? What effect will discounts have on the turnover? What are the area coverage levels? How many contacted people became a customer? Promotions’ results? What is the competition doing? 12

13 Key Performance Indicators Top 10
Source: Results FIND! The Best benchmarkstudy conducted in 1997/1998 by Ernst & Young Consulting and VU. 103 industrial companies participated in the study. 13

14 En nu alleen nog even meten… Business Intelligence (performance measurement)
14

15 The Answers The information is there, but spread everywhere! 15 10

16 De praktijk... 16

17 Problemen (Over)belasting IT-afdeling (queries)
Lange doorlooptijd rapport-’fabricage’ Hoge kosten aan manuren Databronnen moeilijk integreerbaar Niet-gestandaardiseerde rapporten Geen eenduidige definities Foutgevoelig Manipuleerbaar Afhankelijkheid van ‘schakels’ Discussies over verschillen in cijfers Beperkte analyse-mogelijkheden Verkeerde en te late interpretaties, conclusies, beslissingen ... Problemen 17

18 Een druk op de knop... 18

19 Van chaos... Naar structuur 19

20 Why now? Hype? Developments:
Market Pull Globalisation of markets Individualisation of customers Shorter life cycle of products Information overload Mergers Technology Push Faster hardware Cheaper disk capacity Modern OLAP-tools Any access: c/s, web, mobile 20

21 OnLine Analytical Processing
Gebaseerd op de syntax van management-informatie vragen: <meetwaarde> per <dim1> per <dim2> per ... KPI’s, CSI’s en OI’s zijn meetwaarden Produkt, Regio, Klant, Tijd, etc. zijn dimensies (slice & dice) Dimensies kennen hierachiën (drill down) 21

22 OLAP Product Manager’s View Regional Manager’s View Product Time
Financial Manager’s View Ad Hoc View 22 A great advantage of the multidimensional model is that it obviates the multi-table query, or join, of SQL. Analyst working with a multidimensional model, for example, don’t have to find the table that contains the column Units, and then perform a self join to display the column Units Year Ago. Such joins are implicit in multidimensional data. Because the arrays of data are subject-oriented in the multidimensional model, users easily group “like items”, regardless of their alignment in the cube. Users of multidimensional tools just choose what columns and rows to display in reports and graphs. Grouping like items resembles slicing the cubes. Product managers study one product across many time periods and markets. Financial managers focus on the current and previous time period for all markets and all products. Regional managers examine all time periods and all products across some markets. And strategic planners might focus on a subset of the corporation’s data, such as the current and next quarters for an innovative product being sold only in the west. In contrast, the relational data model forces users to understand the flat table structure because they must perform joins of those tables before they can display data. 32

23 Introduction to Cubes Grapes Apples Melons Cherries Pears Atlanta
Product Grapes Apples Melons Cherries Pears Location Atlanta Denver Detroit Location Atlanta Denver Detroit Product Grapes Apples Melons Cherries Pears Product Grapes Apples Melons Cherries Pears Sales Sales Q4 Time Q1 Q2 Q3 23

24 Demo eFashion Case BusinessObjects Demo 24

25 BusinessObjects: Semantic Layer
Semantische laag Het begrip ‘semantische laag’ is door Business Objects ingevoerd. Het is in de USA gepatenteerd en is één van de hoekstenen van de Business Objects architectuur. De semantische laag is de vertaling van de technische structuur in een logische, voor eindgebruikers begrijpelijke, bedrijfsmatige datastructuur en omvat universes en metadata. Metadata en universes worden opgeslagen in de centrale repository. De repository is opgeslagen in de centrale RDBMS en wordt door alle modules van het product gebruikt. Dit betekent dat er één beveiligingssysteem en één semantische laag voor alle producten is. Er kunnen verscheidene universes voorkomen, waarbij elke universe zich op een onderwerp richt dat belangrijk is voor een groep gebruikers, die dezelfde bedrijfsterminologie hanteren. Dit zijn als het ware doelgroep-specifieke ‘views’ op de in de database aanwezige data. Een universe bestaat uit: Bedrijfstermen die worden gebruikt in fysieke databasevelden en berekeningen. Dimensies, deze kunnen worden gegroepeerd tot hiërarchieën die worden gebruikt om het drill-proces te sturen. Voorgedefinieerde voorwaarden die complexe bedrijfscriteria weergeven. Alle informatie over de databasestructuur, waaronder databaselokatie/type, tabellen, joins en kardinaliteiten. 25

26 Any Access 26

27 Info- & Analysis-need at 3rd parties
27

28 e/m-Business Intelligence: Extranets
SUPPLIERS PARTNERS extra net extr anet Data Warehouse CUSTOMERS 28

29 Extranet demo’s 29

30 Business Intelligence Theory
30

31 BI Definition Business Intelligence is the process of collection, cleansing, combining, consolidation, analysis, interpretation and communication of all internal and available external data, relevant for the decision making process in the organisation 31

32 BI Concept Feedback Business Value Decisions Action Analysis Knowledge
Integration Information Collection Data 32 12

33 BI Systems Reporting & Query DSS, MIS and EIS OLAP Data Mining 33

34 The Five Functional Levels
mining exploring Number of users Static Dynamic analysis reporting querying Complexity of the question standard reports ‘bunch of reports’, ‘cube’ unique ‘report’ or question i.e. finding variables i.e stat. analysis, testing a hypothesis 34

35 The Five Functional Levels
Number of users 80 % of all users reporting Static/ Dynamic analysis Complexity of the question querying exploring mining interactief 35

36 Corporate Information Factory
Any Source Any Data Any Access LAN/WAN Q U E R Y M A N G T L O A D M N G E T Data Marts Applications External data Data Warehouse WWW Operational Data Store 36 The Corporate Information Factory The corporate information factory can be compared with an actual factory. All kinds of raw material and assembly goods enter a factory and are immediately collected by inventory management processors. Assembly lines transform the goods into a product. Some products are completed and finished products, other products can be further assembled. Normally data flows from the applications, via the data warehouse to the data marts. However, there is also a feedback loop. Data in the corporate information factory is used to make decisions. These decisions will have an impact on the business, which will first be detected by the applications. For example, when a retailer decides to produce more of product ABC, sales are boosted and the increase in sales is measured by the applications. One of the most common variations to the corporate information factory is when there is no ODS. Many organisations operate successfully without an ODS. Source: Corporate Information Factory by Bill Inmon, Claudia Imhoff and Ryan Sousa - Wiley Computer Publishing (1998)

37 Components of the CIF Data Warehouse Data Mart Operational Data Store
ETL 37

38 Data Warehouse 38

39 Definition Bill Inmon Characteristics of a data warehouse:
Subject-oriented Integrated Time-variant Non-volatile Both summary and detailed data Subject-oriented: the data warehouse is organised along the lines of major entities of the organisation, such as product and customer. Integrated: the data warehouse has common key structures, common definitions, common naming conventions, etc. Time-variant: the data warehouse is a massive series of snapshot records. Therefore, an element of time is added to the key structure. Non-volatile: there are no updates in the data warehouse. Changes are captured by adding a new snapshot. Source: Corporate Information Factory by Bill Inmon, Claudia Imhoff and Ryan Sousa - Wiley Computer Publishing (1998) 39

40 Data Warehouse Contains data that can be used to meet the information of (part of) the organisation Contains integrated data extracted from one or more sources Mostly contains large amounts of data Contains data that is clean and consistent May contain aggregated data Optimised for its use 40

41 Data Warehouse Data Base Data Warehouse Actual Historical Internal
Internal and External Isolated Integrated Transactions Analysis Normalised Dimensional A database only contains actual data. Historical data will be systematically removed from the operational database. A data warehouse contains data of a period from 5 to 10 years. A database contains data from one internal, isolated source. A data warehouse contains data extracted from one or more internal and external sources. This implies that data must be integrated. A database is designed for transactions of small amounts of data. A data warehouse is designed for the analysis of large amounts of data. Therefore, the database has a normalised structure and the data warehouse has a dimensional structure. A database can contain dirty data. This is dependent on the data-entry rules of the application and the creativity of the users. Data in the data warehouse has always been cleansed and transformed. A database consists of detailed data, whereas the data warehouse consists of both detailed and aggregated data. Source: Masterclass CIBIT Dirty Clean and Consistent Detailed Detailed and Summary 41

42 Data Warehouse Advantages One point of contact Time savings
No loss of historical data OLTP’s not hampered by BI activities Better consistency and quality of data Improvement of Business Intelligence One point of contact: information always comes from the data warehouse. Users do not have to wander through the entire organisation to get bits and pieces. Time savings: the amount of time needed for reporting and analysis will decrease drastically. No loss of historical data: historical data continues to be accessible because of the data warehouse. OLTP’s not hampered: as the data warehouse is a “copy” of the data sources from the OLTP’s, the OLTP’s will not be disturbed by hughe queries. Better consistency and quality of the data: the organisation must use the same codes and definitions in order to be able to integrate the data. This improves the consistency. The quality of data will improve as cleansing is an important issue of data warehousing. Besides that, dirty data will be visible. Improvement BI: with a data warehouse it is possible to get the right information in the right format and quantity at the right person in the right time. Source: Masterclass CIBIT 42

43 Data Warehouse Disadvantages Never quite up-to-date
Requires a lot of storage space Requires a lot of communication, coordination and cooperation Large impact on the organisation A data warehouse is only the beginning Never quite up-to-date: as the data warehouse is loaded once per day or once per week, it never contains recent data. An operational data source, which will be discussed later, can address this problem. Lot of storage space: as data is stored redundantly and as history is kept, the amount of data gets very large (gigabytes-terabytes). Lot of ccc: because of the integration of multiple data sources, communication, corporation and coordination is extremely important. New communication lines (for example between system administrators) must be established. Large impact on the organisation: the organisation will be forced to speak the same language, responsibility resides at a lower level and the hierarchy will be changed. It is possibe that the data warehouse shows that the person, from whom everybody thought he sold most, is a very bad sales man. A data warehouse is only the beginning: one step at a time! As the data warehouse has large impact on the organisation, there will be very much resistance. Source: Masterclass CIBIT 43

44 Data Mart DW design does not optimise query performance
Data is not stored in an optimal fashion for any given department in the DW Competition to get the resources required to get inside the DW Costs for DSS computing facilities are high because of the large volume in DW As a data warehouse grows in size and maturity, data marts become an attractive option beacause: 1: The design of a data warehouse evolves to efficiently integrate and manage large volumes of data. 2: As the data warehouse is a corporate facility, it’s not optimised for a given department. 3: The data warehouse is used by many people. Source: Corporate Information Factory by Bill Inmon, Claudia Imhoff and Ryan Sousa - Wiley Computer Publishing (1998) 44

45 Data Mart Characteristics: Customised for a specific department
Limited amount of history Summarised Very flexible Elegant presentation Processor dedicated to the department A data mart is a subset of a data warehouse that has been customised to fit the particular DSS processing needs of a given department. Two types of data marts exist: 1) dependent data mart, which load data from the data warehouse and 2) independent data marts, which load data from the ETL layer. A data mart is very attractive because: 1) a department can completely control the data and processing that occur inside the data mart, 2) the department needs a significant smaller machine, which will decrease the cost and 3) the data from the data warehouse is customised to suit the peculiar needs of a department. Source: Corporate Information Factory by Bill Inmon, Claudia Imhoff and Ryan Sousa - Wiley Computer Publishing (1998) 45

46 Data Mart Divided by: Business Geography Security Political (budget)
Structure (data mining) 46

47 Data Mart Three different kinds of data marts: Subset/summary MOLAP
ROLAP Subset/summary: simple subset of the data warehouse. MOLAP (Multidimensional On Line Analystical Processing): dimensions of data are created and the data is summarised along any number of dimensions. High query flexibility and good performance. ROLAP (Relational On Line Analystical Processing): a multidimensional view of data on proven relational DBMS technology. Easy to customise dimensions. Source: Corporate Information Factory by Bill Inmon, Claudia Imhoff and Ryan Sousa - Wiley Computer Publishing (1998) 47

48 Operational Data Store
Characteristics: Subject-oriented Integrated Current-valued Volatile Detailed data The first two characteristics are similar to that of a data warehouse. Current-valued: an ODS typically contains daily, weekly and monthly data, but the data ages very quickly in an ODS. Therefore, limited amount of historical data. Volatile: an ODS can be updated as a normal part of processing. It has equally strong elements of operational processing and DSS processing. This dual nature makes it the most complex architectural structure in the CIF. Detailed data: an ODS only contains detailed data as summary data is dynamic. The sales value calculated at 11:00 AM differs from the sales value calculated at 17:00 PM. Source: Corporate Information Factory by Bill Inmon, Claudia Imhoff and Ryan Sousa - Wiley Computer Publishing (1998) 48

49 ETL: Extraction Source selection:
Data model is starting point: determine data elements that are needed For each data element, determine available data sources If more han 1 source available, select on: Quality, reliability and integrity Scope of data Location and availability of data Location and availability of expertise 49

50 ETL: Transformation Processing: Aggregate records Encoding structures
Simple reformatting Mathematical conversion Resequencing of data Default values Key conversion Cleansing Aggragate records: the aggregate records are very useful for changing the granularity of data and managing the volume of data that ends up in the data warehouse. Encoding structures: the gender could be represented in different manners by the applications. One application might use 0, 1 and 2 as the other application uses M, V and unknown. Only one encoding structure is allowed in the data warehouse. Simple reformatting: date fields must be reformatted to one format. Mathematical conversion: there may be many reasons for a mathematical conversion, such as: a change in accounting periods, a conversion of monetary rates, account adjustments. Resequencing of data: as data come from different sources, the records must be resequenced. Default values: default values must be specified for data elements that are not populated as they pass through the ETL-layer. Key conversion: see next slide. Source: Corporate Information Factory by Bill Inmon, Claudia Imhoff and Ryan Sousa - Wiley Computer Publishing (1998) 50

51 ETL: Transformation Key transformation Key structure A Key structure B
Key structure C Key structure A The strategy is dependent upon: what does the data model specify is there a dominant application is most of the data already in a standard key format It is very common to add an element of time to the key as data passes through the ETL-layer. Source: Corporate Information Factory by Bill Inmon, Claudia Imhoff and Ryan Sousa - Wiley Computer Publishing (1998) Key structure B New key structure Key structure C 51

52 ETL: Cleansing Data quality is critical for: Marketing communications
Targeted marketing Customer matching Retail- and commercial householding Combining information Tracking retail sales Marketing communications: poor data quality leads to misspelled addresses, multiple letters to the same person or non-arrivals. Targeted marketing: completeness and correctness of attributes is of vital importance for the generation of an effective mailing list. Customer matching: major issue in banking and healthcare. It is very important to list all accounts/visits of one individual. Householding: good data quality makes it possible to identify the overall needs of an household and to suggest expansion of products (cross-selling). Combining information: matching internal data with external data and merging customer/product lists after an acquisition are easier with good data. Tracking retail sales: it is impossible to determine total sales for a specific product(group) if the product name is spelled in 20 different ways. Source: Dealing with Dirty Data by Ralph Kimball - DBMS (September 1996). 52

53 ETL: Cleansing Common excuses for not cleaning:
The data in the operational systems seem to work just fine Data can be joined most of the time Cleansing will take place after population of the data warehouse Data entry will be improved The users will never agree to change their data 1: An operational system has no problems with the fact that one customer is identified by different codes, that important attributes are hidden in freeform texts and that data values differ from their field descriptions. A large customer with 10 different codes will be invoiced correctly, but will not be recognised as a large customer in the data warehouse. 2: The 80/20 rule is applicable. 80 % of the queries will access that small area of the data warehouse with data problems. 3: Unfortunately, post-pilot updates come too late. In most cases the credibility of the system is destroyed and the financial sponsor is lost. Even if there are still funds to proceed, afterthefact fixups are complex, expensive and even impossible if operational data sources were not retained. 4: Flaws to this reasoning are: 1) never underestimate the creativity of an user, 2) business processes change faster than the applications, which leads to data values that are not consistent with their metadata labels, 3) one standard is impossible if a customer introduces himselve in four different ways and 4) the problem still exists for historical data. 5: With foreign keys, synonym tables and crossrecord linkages, data can be cleansed without changing the original data. Source: The Data Doctor: Five Common Excuses For Not Reengineering Legacy data by George Burch - ???? (January 1996). 53

54 Multi Dimensional Data Modeling
54

55 MD Modeling: Contents E/R Modeling (Ex.) MD Modeling (Ex.) Star Schema
Slowly Changing Dimensions (Ex.) Surrogate Keys Aggregation (Ex.) Measures & Dimensions reviewed Other important MDM aspects 55

56 Exercise: E/R Modeling
How could the sales transaction database of the eFashion retailer look like? Ticket Ticket_nr Store_nr Card-nr Employee_nr Time_Stamp Loyalty Card Card_nr Cust_name Adress Zip_code City ... Employee Emp_name Products Sold Product_nr #_products price dicount Products Bar_Code Prod_Desc Actual_price Weight Store Store_name State Manager 56

57 Management Questions Give me the annual revenue of all my product lines divided over all the sales regions over the last 3 years Give me the top 10 of most profitable products this year Give me the top 10 of most sold products of last year Give me the top 10 of most profitable customers Compare the YTD revenue with the one in the same period last year and the target 57

58 Why not E/R Modeling? End users cannot understand, remember, navigate an E/R model (not even with a GUI) Software cannot usefully query an E/R model Use of E/R modeling doesn’t meet the DW purpose: intuitive and high performance querying 58

59 Exercise: Model the Efashion DM
Sales Revenue Time hierarchy (Year-Quarter-Month) Store hierarchy (Region, State, City, Store) Product hierarchy (Line, Category, SKU) 59

60 eFashion Data Mart 60 Facts Time Month_nr Store_nr Month_desc SKU_nr
Sales_revenue ... Product SKU_desc Category Line Time Month_desc Quarter Year Geography Store_name City State Region 60

61 DW Modeling Components
Geographic Product Time Units $ Dimension Tables Fact Table Measures Facts Dimension 61

62 Using a Star Schema Fact Table Dimension Table Time_Dim Sales_Fact
TimeKey TheDate . Sales_Fact EmployeeKey ProductKey CustomerKey ShipperKey $ Employee_Dim EmployeeID Product_Dim ProductID Customer_Dim CustomerID Shipper_Dim ShipperID 62

63 Components of a Star Schema
Employee_Dim EmployeeKey EmployeeID . EmployeeKey Time_Dim TimeKey TheDate . Product_Dim ProductKey ProductID . TimeKey Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey $ . ProductKey Multipart Key TimeKey EmployeeKey Dimensional Keys ProductKey CustomerKey ShipperKey Measures Shipper_Dim ShipperKey ShipperID . Customer_Dim CustomerKey CustomerID . ShipperKey CustomerKey 63

64 Exercise: Slowly Changing Dimensions
Suppose the product categories change from time to time. Model the Data Mart when the manager wants to see historical reports against: 1. The present categories 2. The categories at the time of the sale 3. Both against the present categories and the immediate previous categories 4. The categories at any specified time 64

65 SCD Exercise 1 65 Facts Time Month_nr Store_nr Month_desc SKU_nr
Sales_revenue ... Product SKU_desc Category Line Time Month_desc Quarter Year Geography Store_name City State Region 65

66 SCD Exercise 2 66 Facts Time Month_nr Store_nr Month_desc Product_key
Sales_revenue ... Product SKU_nr SKU_desc Category Line Time Month_desc Quarter Year Geography Store_name City State Region Most Recent Product Key Map Product_key SKU_nr 66

67 SCD Exercise 3 67 Facts Time Month_nr Month_nr Store_nr Month_desc
Product_key Sales_revenue ... Time Month_nr Month_desc Quarter Year Product Product_key SKU_nr SKU_desc Category Category_old Line Geography Store_nr Store_name City State Region 67

68 SCD Exercise 4 68 Facts Month_nr Store_nr SKU_nr Sales_revenue ...
Time Month_nr Month_desc Quarter Year Product SKU_nr SKU_desc Category Line Valid_from Valid_until Geography Store_nr Store_name City State Region 68

69 Slowly Changing Dimensions
Type 1: Overwrite the dimension record Type 2: Create new dimension record Type 3: Create an ‘old’ field in the dimension record Type 4: Add a valid_from and valid_until field in the dimension record Ad. Type 2: requires surrogate keys, but in general, one should always use these because of performance and flexibility Ad. Type 4: Kimball only recognizes 3 types SCD’s 69

70 Always Use Surrogate Keys
Allows DWH to assign new key versions for SCD’s (type 2) Higher performance with numeric keys than with long, alphanumeric keys 70

71 Exercise: Aggregation
Suppose the manager queries frequently on product line level and finds the performance too low. Question: How to model the data mart when we want to add aggregated measures on product line level? 71

72 Exercise: Aggregation
Facts Month_nr Store_nr Product_key Sales_revenue ... Time Month_nr Month_desc Quarter Year Aggregated Facts Week_nr Store_nr Line_key Sales_revenue ... Product_Line Line Product Product_key SKU_nr SKU_desc Category Line Geography Store_nr Store_name City State Region 72

73 Exercise: Measures Add the following measures to the eFashion Data Mart: Stock Quantity Product Price Promotion Costs (product-specific, store-independent) 73

74 Exercise: Measures 74 Q_Stock Facts Quarter Month Store_nr Year SKU_nr
Stock_qty (av, eom) Month Year Facts Month_nr Store_nr Product_key Sales_revenue Stock_qty Time Month_nr Month_desc Quarter Year Promotion Facts Month_nr SKU_nr Promotion_cost Duration Promotion_type ... Product Product_key SKU_nr SKU_desc Price Category Line (Valid_from Valid_until) Geography Store_nr Store_name City State Region 74

75 Measures & Dimensions reviewed
The most useful measures are Numeric Additive Dimensions are: The natural entry points of the facts I.e., used for constraints and report breaks Independent of each other, not hierarchically related 75

76 Other Important MDM-Aspects
Cardinality Grain Referential Integrity Conformed Dimensions Drill Across Traps 76

77 Operational Data Store
How to make the CIF? Any Source Any Data Any Access LAN/WAN Q U E R Y M A N G T L O A D M N G E T Data Marts Applications External data Data Warehouse WWW Operational Data Store 77 The Corporate Information Factory The corporate information factory can be compared with an actual factory. All kinds of raw material and assembly goods enter a factory and are immediately collected by inventory management processors. Assembly lines transform the goods into a product. Some products are completed and finished products, other products can be further assembled. Normally data flows from the applications, via the data warehouse to the data marts. However, there is also a feedback loop. Data in the corporate information factory is used to make decisions. These decisions will have an impact on the business, which will first be detected by the applications. For example, when a retailer decides to produce more of product ABC, sales are boosted and the increase in sales is measured by the applications. One of the most common variations to the corporate information factory is when there is no ODS. Many organisations operate successfully without an ODS. Source: Corporate Information Factory by Bill Inmon, Claudia Imhoff and Ryan Sousa - Wiley Computer Publishing (1998)

78 CGE&Y BI-Approach Overview
Communication Incremental Delivery Strategy & objectives DW blueprint Source data Metamodel Extraction, Transformation Load Development Awareness Definition Increments Implementation Data Warehouse Architecture I Data Warehouse Architecture II Evolutionary Strategy Project Management 78

79 79

80 Data Mining 80

81 Data Mining Definition:
The process of digging intelligently into large volumes of data to discover and analyse previously unknown relationships or to validate hypotheses. 81

82 Data Mining Versus OLAP
OLAP/Query Are there some customers from large accounts with a high decrease in international calls? Data Mining Are there any common characteristics among these customers? Data Information 82

83 Applications Risk Analysis (grant credit, investment)
Fraud Detection (telephone charge, bank withdrawals) Trouble Shooting and Diagnosis Process Controls (wafer fabrication) Promotion Analysis Bankruptcy Prediction (mortgage lending, business partners) Customer Churn (telco) CRM (next slides) 83

84 Maximizing Customer Value
Getting more prospects in Turning prospects into customers Selling more products to existing customers Getting less customers out 84

85 Which ones in and which ones out?
Keep Growth Yield per individual customer Migration Yield per customer Costs per customer Om te investeren in klanten met als doel er trouwe klanten van te maken is het van belang te weten wie je winstgevende klanten op de lange termijn zijn en wie verliesgevend is. Van de verliesgevende klanten neem je afscheid. In de winstgevende of potentieel winstgevende klanten investeer je zodat deze klant blijven en uiteindelijk groeien. Out Highest Lowest Customer profitability 5

86 Example: One to One Marketing
Treat different customers differently differentiate message differentiate product offer differentiate channel Need for usable information => predict customer behavior out of databases 86

87 14

88 15

89 89

90 90

91 91

92 92

93 Example: clickstream analysis
What parts of our Web site get the most visitors? What parts of the Web site do we associate most frequently with actual sales? What parts of the Web site are superfluous or visited infrequently? Which pages on our Web site seem to be "session killers," where the remote user stops the session and leaves? What is the new-visitor click profile on our site? What is the click profile of an existing customer? A profitable customer? A complaining customer that all too frequently returns our product? What is the click profile of a customer about to cancel our service, complain, or sue us? How can we induce the customer to register with our site so we learn some useful information about that customer? How many visits do unregistered customers typically make with us before they are willing to register? Before they buy a product or service? 93

94 Customized Customer Service
Tele-sales Service desk 94 16

95 Example: Contact Strategy
Sales visit Channel optimisation Good Tele-sales Data mining Direct Mail Customer Data Bad 95 29

96 The customer choses the channel
Organisation! Operational systems Service question CC App. Complaint handling Integration Analysis Leaflet request Contact Leaflet receipt Order Status service question Complaint Status order De klant kiest het kanaal los van de boodschap. Erger nog, de klant gebruikt de kanalen en de boodschappen door elkaar heen. Dell heeft het nog goed voor elkaar, maar is daarmee wel een uitzondering. In veel organisaties is de organisatorische indeling gebaseerd op gescheiden business processen. Iedere afdeling zijn eigen klant in zijn eigen systeem. Technisch moeilijk aan elkaar te knopen, maar organisatorisch helemaal! 96

97 Data Sources for Data Mining
Collecting & Cleansing DATA Transactions (loyalty cards) Behaviour of existing customers Logfiles & cookies Market research Data suppliers Public data 97 20

98 Example: Affinity Grouping
Market Basket: what items are sold together? Market Basket: what categories are sold with what items? Market Basket: what is not sold with certain items? Event Correlations: what other services are brought in the first month after signing up for a satellite TV subscription? 98

99 Data Mining Techniques
Decision Trees, Classification Trees, Rule Induction Neural Nets Visualisation Fuzzy Logic; Nearest Neighbour; Memory Based Reasoning; Case Based Reasoning Proprietary Logic Classical Statistics 99

100 Data Mining Techniques
 Statistical analysis  Neural networks  Genetic algorithms  Decision trees  Intuïtion Predictive Power Simplicity 100 32

101 Critical Success Factors
Data availability (large amounts of a wide variety of data) Data consistency Data quality Domain expertise Data used/needed is allowed by privacy laws 101

102 Benefits Improved customer relationships
More revenue from existing customers Market segmentation Differentiated products and services Differentiated sales channels More effective marketing programs Improved fraud detection Improved investments 102

103 Decision Tree with BusinessMiner from BusinessObjects
Demo Decision Tree with BusinessMiner from BusinessObjects 103

104 Contact information Tom A. Fürstenberg Business Intelligence Consultant Cap Gemini Ernst & Young Sector Energy, Products & Transport Tel 104


Download ppt "Business Intelligence & Data Warehousing"

Similar presentations


Ads by Google