Presentation is loading. Please wait.

Presentation is loading. Please wait.

Demonstration 10 EDW Implementation Strategy and Process 1/10/2012 www.InstantBI.com.

Similar presentations


Presentation on theme: "Demonstration 10 EDW Implementation Strategy and Process 1/10/2012 www.InstantBI.com."— Presentation transcript:

1 Demonstration 10 EDW Implementation Strategy and Process 1/10/2012 www.InstantBI.com

2 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 2 Agenda  Standard Implementation Strategy  Standard Implementation Process  Prototyping  Setting up the Staging Area

3 Standard Implementation Strategy

4 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 4 Prototyping  “Plan to build it twice. You will anyway”  Frederick P Brooks – The Mythical Man Month  The BEST IT development book ever written  DWs have become FAR more sophisticated  Early 90s (90-94)  10-20 dimension tables and 5-10 fact tables  All about ‘sales’ and campaigns  These were the high value applications  300+ work days to build the ETL for these in cobol  Today with models like BI4ALL or even custom development  100+ dimension tables and 30+ fact tables (easily)  200-300 work days to build the ETL  Much more of the ‘ETL time’ is ‘understanding data’

5 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 5 Prototyping  Prototyping allows you to  Find data understanding errors before writing ETL  Quickly tune database model for performance  Start developing reports earlier  Report development tools have become more ‘complex’  Ensure data integrity earlier  Data integrity is the #1 killer of ETL productivity  We used to only find errors when we wrote the ETL  Find ‘assumptions’ that do not hold up for this EDW  More easily communicate the ‘end result’ to business users  If you can’t build the prototype, you can’t build the real thing  Bottom Line: Strongly recommended

6 1 System 1 System 2 System 3 EDW Staging Area Trans form And Load EDW Validate and Clean Source Systems Direct Connect App 1 Commercial Specific Apps Data Marts Data Mart 1 BI Apps 1 BI Apps 2 BI Apps 3 Extracts Reporting Systems System 4 System 5 System 6 System 7 System 8 System 9 System 10 ODS Trans form And Load Extracts Data Mart 2 Data Mart 3 Data Mart 4 Data Mart 5 Data Mart 6 App 2App 3App 4 App 5 App 6 App 7 App 8 App 9App 10 App 11 App 12 App 13 App 14

7 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 7 Standard Implementation Strategy  Data is ‘somehow’ extracted from source systems  Must be careful to detect deletes  Can be a very difficult problem to solve  Data, as raw as possible, sent to Staging Area  Usually as files, sometimes as ODBC links  The OLTP system usually controls extraction schedule  Extract one field -> extract whole table  Data is profiled to determine real data type  Staging area tables use ‘real’ data types  Start developing understanding of raw data  Understanding data is a huge and difficult job usually

8 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 8 Standard Implementation Strategy  Data loaded into ‘staging area’  Today we can afford RDBMS staging areas (used to be files)  As data goes into the staging area fields are converted from possible native types to RDBMS types  Numeric/string dates, codes, flags, numerics in chars  Errors must be caught so both source and target columns are kept so the source value is visible  Calculations within a row are performed  Durations/elapsed times, ages etc  Three Flags are set  Row deleted from source – if it was deleted  Row valid – valid by default – set to ‘N’ if found invalid  Row sent to EDW = ‘N’ because it has not  It is then possible to run ‘cross table validations’ on the data BEFORE sending it into the EDW  Always beware sending invalid data into a DW

9 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 9 Example Staging Table CREATE TABLE UNBILLED_CALLS ( CUM_START_DATE_TIME DATE NOT NULL, CUM_START_DATE_TIME DATE NOT NULL, CALL_DIALLED_DIGITS VARCHAR2(18 BYTE), CALL_DIALLED_DIGITS VARCHAR2(18 BYTE), CALL_DURATION NUMBER(10,2) NOT NULL, CALL_DURATION NUMBER(10,2) NOT NULL, CALL_RETAIL_PRICE NUMBER(14,3) NOT NULL, CALL_RETAIL_PRICE NUMBER(14,3) NOT NULL, CALL_BREAKDOWN_CODE VARCHAR2(5 BYTE) NOT NULL, CALL_BREAKDOWN_CODE VARCHAR2(5 BYTE) NOT NULL, CALL_DISCOUNT NUMBER(4,1) NOT NULL, CALL_DISCOUNT NUMBER(4,1) NOT NULL, CALL_UNITS NUMBER(4) NOT NULL, CALL_UNITS NUMBER(4) NOT NULL, CALL_PP_ALLOWANCE NUMBER(5,1) NOT NULL, CALL_PP_ALLOWANCE NUMBER(5,1) NOT NULL, CALL_CLASS VARCHAR2(5 BYTE), CALL_CLASS VARCHAR2(5 BYTE), CALL_CATEGORY VARCHAR2(5 BYTE), CALL_CATEGORY VARCHAR2(5 BYTE), CALL_ORIGINATION VARCHAR2(2 BYTE), CALL_ORIGINATION VARCHAR2(2 BYTE), SERVICE_CODE VARCHAR2(2 BYTE) NOT NULL, SERVICE_CODE VARCHAR2(2 BYTE) NOT NULL, CUM_CUSTOMER NUMBER(10) NOT NULL, CUM_CUSTOMER NUMBER(10) NOT NULL, CUM_SUBSCRIBER VARCHAR2(18 BYTE) NOT NULL, CUM_SUBSCRIBER VARCHAR2(18 BYTE) NOT NULL, CALL_DIRECTION VARCHAR2(2 BYTE), CALL_DIRECTION VARCHAR2(2 BYTE), CALL_LOCATION VARCHAR2(13 BYTE), CALL_LOCATION VARCHAR2(13 BYTE), CALL_DESTINATION VARCHAR2(5 BYTE), CALL_DESTINATION VARCHAR2(5 BYTE), CALL_RECORD_TYPE VARCHAR2(3 BYTE) NOT NULL, CALL_RECORD_TYPE VARCHAR2(3 BYTE) NOT NULL, SERVICE_TYPE VARCHAR2(2 BYTE) NOT NULL, SERVICE_TYPE VARCHAR2(2 BYTE) NOT NULL, BUCKET_TYPE VARCHAR2(5 BYTE), BUCKET_TYPE VARCHAR2(5 BYTE), ROW_DEL_FRM_SRC_IND VARCHAR2(1 BYTE) DEFAULT 'N' NOT NULL, ROW_DEL_FRM_SRC_IND VARCHAR2(1 BYTE) DEFAULT 'N' NOT NULL, ROW_VALID_IND VARCHAR2(1 BYTE) DEFAULT 'Y' NOT NULL, ROW_VALID_IND VARCHAR2(1 BYTE) DEFAULT 'Y' NOT NULL, ROW_SENT_TO_IWS VARCHAR2(1 BYTE) DEFAULT 'N' NOT NULL ROW_SENT_TO_IWS VARCHAR2(1 BYTE) DEFAULT 'N' NOT NULL)

10 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 10 Getting Data into Staging Area  Get data into a Staging Area asap  Helps in learning to understand the data  Can query/browse much more easily than in files  We now use (free) utilities to load staging area pttype  Using utilities is faster and less costly than using Infa for pptype development  Read a wide variety of files and load the data ‘as is’  Defaults the flags  Get ALL data for this release into staging area before starting mapping  Or at least as much as possible  Late arriving data can only confuse the issue

11 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 11 Cross Table Validations  Cross table validations might include  Checking customer/account exists for a sales record  Checking address is a valid address  Checking details provided by retailer match other systems such as sell through capture  Checking codes entered on tables exist  The list of ‘possible’ things to check is endless  EA must decided what validations will stop data from flowing into the EDW  These are more likely to be business than technology based  We have only built to capability to do so, not the rules themselves  Can be implemented in Infa or Stored Procedures

12 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 12 Data Quality  Data Quality Measures/Correction  Can be implemented in ETL tools if acquired or ‘home grown’  Can send data back to source systems if needed  Is a whole ‘other topic’  But is will be done at some point in the life of EDW

13 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 13 The ‘Mapping Spreadsheet’  So, Staging Area has data in it ‘What to do next?’  Build the left hand side of the ‘mapping spreadsheet’  Source to Target Mapping  The right hand side starts blank!!  Once you have developed your understanding of the data, built/loaded the staging area with all the data you want in the ‘current release’ you are ready to perform ‘data mapping’ and Data Modelling.  To do this, you need to understand current models and have some ideas about BI4ALL model…  Now begins the modeling portion of the training…

14 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 14 Data Mapping  Filling in the right hand side of mapping spreadsheet  Defining ‘real keys’ from source data  Defining tables to be joined/split  Defining how to present a ‘view’ of the staging area such that it can be sent into the EDW  Defining changes to EDW model  Two columns in SS - Table Exists/Column Exists  EDW Modeller sets them for the DBA  We will see the ‘in progress’ version later  Each source field required in the EDW is mapped  Notes and comments are included…  Database level transformations are included in SS  Key role in any EDW implementation

15 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 15 Prototype Building  DBA implements changes to ODW/EDW Physical Model  Tables, indexes, naming standards  Key role is to ‘catch mistakes’ by the modeller  With 000’s of fields to map modeller will make mistakes  From the SS we can now generate pptype ETL  Tools available from IBI  One of which was just written (SeETL)  Includes all elements such as  Generated views  Updating control tables  Running and testing on windows 2008+  All prior to building Informatica/DataStage ETL

16 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 16 Prototype Building  Once Prototype is ‘relatively stable’  Sizable amounts of test data can be loaded  Prototype can be moved to deployment platform  Presentation Views can be created  Cognos Catalogs/Business Objects started  Early reports started  More learning about real data volumes can happen  When errors are found (and they will be found)  Prototype can be changed easily  Generated ETL can be changed easily  When everyone is happy with the pptype  EDW is ‘made real’  Real database, ETL, reports

17 Commercial in Confidence. Copyright 2012 – Instant Business Intelligence. 17 Conclusion  We have talked about the overall process of EDW Implementation  So, once you have developed your understanding of the data, built/loaded the staging area with all the data you want in the ‘current release’ you are ready to perform ‘data mapping’ and BI4ALL Modelling.  To do this, you need to understand the EDW model…  Now begins the modeling portion of the training…

18 Thank You for Your Time


Download ppt "Demonstration 10 EDW Implementation Strategy and Process 1/10/2012 www.InstantBI.com."

Similar presentations


Ads by Google