Presentation on theme: "Holy Warehouse, Holy Grail The Quest for the Single Version of Truth August 2, 2006 Michael Covert."— Presentation transcript:
Holy Warehouse, Holy Grail The Quest for the Single Version of Truth August 2, 2006 Michael Covert
2 Agenda Warehouse scenarios – Why are they built? – What can we expect when building one? Guiding principles – Leadership and Communication – Modeling your business – Incremental Delivery and Construction – Technology Decisions Building a Data Management road map A Case Study
3 Warehouse Scenarios Why do people build data warehouses? – To allow people to focus on analysis instead of collection and assembly of data. This is in response to growing needs to understand a business holistically. Finding it – where is that data? Integrating it – merging data from multiple sources. This is often a source of error and inconsistency. Reporting on it – spreadsheets and expensive tools. Spreadsheets continue to be where most analysis occurs. – To produce a single version of truth Believing it – people and data errors have been corrected. Differing business meanings have been reconciled so that aggregations and variances make sense. – To reduce internal friction and improve decision making Reports from non-integrated systems often produce inconsistent and contradictory information. This leads to slowed decision making and in some cases, infighting.
4 Warehouse Scenarios What can we expect when building a data warehouse? – Data warehouses typically integrate data from multiple areas: Finance, HR, Sales, Inventory, etc. – This invariably leads to larger than normal decision making processes and to multi-departmental involvement Politics will occur. Big picture (enterprise) issues will arise and they will be difficult to solve. – Definitions, processes, frequency, and reliability
5 Warehouse Scenarios Business data can vary greatly in complexity and in definition – Variation in attributes associated with a common entity type leads to an explosion in number of tables Loans – auto loan, revolving credit, overdraft protection, mortgage, ad infinitum Apparel – a myriad of categories, subcategories, and lower level hierarchies of product-specific attributes – Integration and aggregation of these entities requires cross- departmental alignment of the most basic definitions Risk elements – probability of default, loss given default, collateral codes, service charge allocation Sales and marketing – promotion and ring code usage, campaign phase definition, category definition, even SKU and package encoding
6 Warehouse Scenarios The great technology debates – How will I organize my warehouse? Should I build a star schema or a 3NF database? – How many technologies will I use? The database itself – Warehouse, warehouse farms – Operational Data Store – Downstream data marts The ETL Layer A metadata repository Master data management How will I cleanse my data? Usage of data marts and down stream systems How will I implement my analytics and reporting layers?
7 Guiding Principles Leadership – The best functioning data warehouse teams are cross- departmental and collaborative. Leadership is essential. – A shared expense model is preferred, but again requires leadership and cultural adoption. Project based expenses are tempting, but can lead to departmental development of an enterprise asset. Communication – Communicate frequently. Advise on new data areas as they become available. Strive for smaller but more frequent additions. – Track adoption and use communication to increase usage. – When sufficient adoption as been achieved, if possible TURN OFF THE OLD SYSTEM(S)!!!
8 Guiding Principles Model your business and maintain these models. Use a data modeling tool to maintain your models.
9 Guiding Principles Incremental delivery – Break the warehouse into subject areas that can be developed and evolved incrementally. – Think multi-dimensionally. Devise a multi-dimensional structure for each subject area. Identify overlaps where shared dimensions exist. 3NF versus Star schemas – In most cases, 3NF versus star schema decisions should be based on: Skill sets In-place technology and processes Technology governance – Many times, 3NF schemas have views that emulate star schemas. Beware of view join overhead. * Build on successes
10 Guiding Principles Technology selection – Limit technology through IT stewardship and governance. Stay within product families where possible. Implement an ETL layer. Use it to reuse data interfaces and reduce point-to-point data complexity. Use multi-dimensional systems to offload data aggregation complexity. – Maintain conforming dimensions. Strive for enterprise reporting, but realize that it is very difficult to achieve.
12 Guiding Principles Layer your database to provide: – A data “landing zone” (also referred to as a staging area) – A data cleansing and integration layer Assign data ownership to the owning line of business. Build in “data lineage” traceability. Program defensively from the very beginning. Put cleaning rules here. Avoid intermingling them into operational code at all costs. Strive to reverse engineer them out of surviving systems. – A cleansed, integrated layer that is used to: Feed downstream systems. Provide the primary reporting interface for end user systems. Resist allowing access to any other layer, specifically the landing zone! *** – Use this layered technique to build in audit controls and “restartability”.
13 Building a Data Management Road Map A data management roadmap defines all data management processes and control objectives. – CoBiT, ITIL, et al
14 A Case Study This case study involves a large financial institution with s significant portion of their business involving collateral-secured loans. Loan reporting environment was manually intensive and heavily driven off of Microsoft Access databases (approx. 400 Access databases) End user queries were run against transactional systems. The data that was retrieved was then integrated in redundant, and often inconsistent processes Limited scalability for future growth (Microsoft Access databases) Similar queries and duplicated analyses were performed by Business Analysts Business Analysts spent approximately 80% of their time gathering, re- keying and developing reports Major inconsistencies in these reports were producing political pressures between sales and the financial analysts.
15 Case Study – Initial Environment Data Sources Origination Dealers Locations Employees Servicing Risk Securitization
16 Case Study – New Environment Landing Zone Cleanse and Integrate
17 Case Study - Results Excellent user acceptance of new system Consolidated database now reduces departmental time required to access data Simplified and pre-integrated data has reduced many inconsistencies Excellent load and response times Microsoft Reporting Services is being used to produce increasingly complex reports used by end users Microsoft Analysis Services produces cubes that are easy to access and have nearly instantaneous response time The new system architecture is much easier to change since it is simplified. New data sources have been added incrementally.
18 Conclusion Leadership, Stewardship, and Communication Business modeling and mapping Data quality and ownership Technology Governance and Control – Adoption of a data management roadmap – Data architecture – Technological risk management Incremental delivery