Presentation on theme: "By Jason Perkins & William O’Shea Mission Critical BI in an EDW 2.0 world."— Presentation transcript:
By Jason Perkins & William O’Shea Mission Critical BI in an EDW 2.0 world
Jason B Perkins Chief Architect on the Secondary Uses Service (SUS) Programme for BT Health Over 10 years working on some of the world’s largest and most complex Business intelligence and Data warehouses programmes. Highlights from Career Lead BI Architect for BT Retail o ODM Consumer Reference Set (CRS) o BT Mobile Data Strategy o National Name & Address Database (NAD) Solution Architect for (Swift) BT Marketing Data warehouse. Qualifications TDWI Certified Intelligence Professional (CBIP) DAMA Certified Data Management Professional (CDMP) Subject matter expertise across Health, Retail and Telecoms Will O’Shea Data Warehouse Consultant at AMS Systems, currently assigned to the NHS Over 20 years of experience in consulting, focusing on Data Warehousing and Oracle RDBMS. Highlights from Career Worked with Gene Amdahl Development Lead at Oracle Oracle Consultant at Blue Cross DWH Consultant at Pfizer Data Warehouse consultant at Johnson & Johnson; awarded Innovation award for “Data warehouse in a box”. Technical Architect at the NHS, awarded Champagne award by Atos Origin for implementing RDM process Education MBA from University of Manchester (MBS) BSc from University of Waterloo, Canada. Oracle Certified Professional (10g DBA) Subject matter expertise Financial, Healthcare & Pharmaceutical. About the Presenters
Agenda MCBI - The Business View Mission Critical Architecture Mission Critical Method * BREAK * Mission Critical Principles & Operating Model Mission Critical Building Blocks Summary
Business Intelligence? “In God we trust. Everyone else bring data?” W. Edwards Deming
Types of BI? Operational BI - Optimise & track core operational processes - Bottom up - Detailed - Monitoring Tactical BI - Project analysis and departmental activities - Departmental - Detailed / Summary - Analysis Strategic BI - Strategic Execution and analysis. - Top Down - Summary - Management TDWI “Three threes of Performance Dashboards”
Mission Critical BI
Mission Critical BI – Why? Business 2.0 Always on Self service Joined up - 360 view of the customer. Available everywhere BI/DW no longer a back office function / system. Cost of entry in most industries. What you do with it remains a competitive differentiator. Pervasive Business Intelligence Globalisation Zero Latency Enterprise Operational Decision support “Enterprises compete by using up-to-date information to progressively remove delays to the management and execution of its critical business processes. Gartner”
Mission Critical BI – Real World Examples E-everything – 24x7 E-Government Health care monitoring – Commissioning / Payment for quality / results Referral to treatment times Payment for Quality Telecommunication Bandwidth management / Mobile Coverage Order to fulfilment MIS Retail – Just-in-time inventory
Mission Critical – Challenges Mission Critical BI is not new! So why is it so hard? “Pace of change” keeps increasing … Continued Pressure on IT Spend – estimated ~20-30% reduction in 2009/10. BI / DW keeps evolving – Many of the original mission statements of BI/DW remain elusive. Increased demand for integrated information – e.g. unstructured, social media, etc. Data Explosion – “Data volumes will grow exponentially while CPU capacity will increase only geometrically. Gartner”. Security of all the information is paramount BI/DW remains a predominately “build” activity.
Mission Critical – EDW Scale Number of different views need to be considered when quantifying the challenge ahead. Varies by industry, type of business and geography.
Mission Critical BI Architecture
EDW Architectures Easy to Build Organizationally Limit Scope Easy to Build Technically No need for ETL No need for separate platform Allows easier customization of user interfaces and reports Tailor spokes for business. Single Enterprise “Business” View Data reusability Consistency Lowest TCO Business Enterprise view unavailable Redundant data costs High ETL costs High App costs High DBA and operational costs Only viable for low volume access Meta data issues Network bandwidth and join complexity issues Workload typically placed on op systems Business Enterprise view challenging Redundant data costs High DBA and operational costs Medium ETL costs Data latency Requires corporate leadership and vision Requires fully performant and scalable technology Independent Data MartVirtual Data WarehouseHub & SpokeCentral Data Warehouse
Mission Critical Maximum AvailabilityFlexibilityMaintenanceSecurityLifecycleMethodInfrastructureAdaptabilityOperationsMigrationsTechnology
Mission Critical DW Architecture BI Applications OLTP & ODS Systems Business Applications Excel XML Business Process Staging Tier Operational Tier Integration Tier Performance Tier
Mission Critical DW Architecture External Business Applications Unstructured Excel XML Business Process Staging Tier Integration Tier Data Quality Performance Tier BI ApplicationsOperational OLAPSandpitsAggregates Consolidation Marts Auditing Customer Tracking Survivorship MDM Problem Resolution AlertsDashboards Ad hoc Query ReportingWeb ServicesAnalytics Conforming Security Loader Services Change Data Capture Data Extracts Community Management Error Management Metadata Services Workflow Monitor Recovery / Restart Job Scheduling Resource Management SCD Manager Fact Loader Adoption Services Validation Services
Serviceability Architecture Automation – lights out / zero touch Flexibility - meta data/reference data driven Robustness - error tracking, handling & reporting Operationally ready Maintenance - load/event tracking & reporting Resilience – Ability to stop individual parts of the system, restart Robustness - error tracking, handling & reporting
Mission Critical Method
Nursery Method Raison d'être BI/DW requires an Iterative approach. Mission critical is no different. New deliveries and changes must:- Protect core services. Facilitate “pace of change” Support re-use Allow experimentation Adapt to changing requirements Involve users Developed “Nursery” Method in response Supports front room and back room deliveries Reduce cycle time. “Nurseries” (AKA Sandboxes) – user initiated ETL process Production of Transformation and Load templates
Nursery Method Growing a system Everyone, business & developers, learns from both development and use of the system Introduces the ability to act on what has been learned Leaves Nursery when mature, and is transplanted into production – not re-grown. Planting the seed Initial Planning Planning RequirementsAnalysis & Design Implementation Transplant Implementation Testing Evaluation Delivery Nursery
Nursery Method The Growing Stages 1. Initial Planning 1. High level overall plan 1. How long are iterations 2. What deliverables are required 2. High level requirements 2. Planning 1. Integrated Small teams 2. Detail Iteration plan 3. Higher level plan for 2 & 3 iteration 3. Requirements 1. Requirements for iteration 1. Should fit within iteration 2. or get broken into small bits 1. Start with lowest level 4.Analysis & Design 1.Integrated Small teams 2.Design specification 5.Implementation 1. Did I mention Integrated Small teams 2. Elaboration & Implementation specification 6. Testing 1. By both business and developers 7. Delivery 1. Delivery to users 8. Evaluation 1. User feed back 2. Quality reports 9. Transplant 1. Final delivery should match 1.1 somewhat
Nursery Method Creating a Nurturing Environment First Steps 1. Initial Plan 1. Overall objective? 2. By when? 2. Define Roles 1. Assign Roles 1. Business roles? 2. User roles? 3. Supplier roles? 2. Commitment from those in the roles!! 3. Define communication 1. Meetings? 1. Frequency 2. Types 1.Periodic weeding - Scrum 2.Watering sessions – Stand-ups 3.others 3. Roles involved in each 2. Tight Integration of roles 1. Documentation from each role – small 2. Frequency of documentation 3. Type of documentation 4. Define outputs from each iteration/phase 1. Plan for cycle 1. Roles involved at what stage 2. Requirement documentation – small 5. Initial Schedule 1. Length of iterations 2. Potential number of iterations Building the Nursery
Nursery Method Creating a Nurturing Environment Next Steps 1. Define system requirements 1. Number of data suppliers ? 2. Amount of data? 3. Number of users? 4. Size of infrastructure required 2. Define First few iterations 1. Cycle 1 1. Get data ? 2. Load data ? 3. Extract data ? 4. Distribute data ? 2. Cycle 2 1. Build some validation? 2. Extract validation outcome? 3. Cycle 3 1. Build in some robustness? Size of Plot Growth cycles
Nursery Method Principals Focuses on: Users – Not Processes and tools Working systems – Not exhaustive documentation Working together – Not adhering to the contract Delivering what is wanted – Not following a plan Adapting to Change – Not Issuing Change Requests Both the Left side and the Right side must exist, but the emphasis is On the Left – Not the Right. Benefits Cycle time from months to weeks, even days! Improve quality – leverage “Lessons Learned”, as they happen Reduce: Cost Delivery time Happy Users !!! Our Real world examples Large International pharmaceutical company (delivered in Months not years) Healthcare Provider (implemented new functionality in days)
Nursery Method Greenhouses - Sandboxes What Constitutes a Sandbox What are the characteristics How do they need to act & interact Users’ play areas Using the “Build Once – Use Many” principal users can Load new data sets Create new tables Create new reports Play with existing data Needs Work Flow Management – Key in a Mission Critical system Isolates the effects of users’ play areas from production Does Not isolate the data. User can access production data Other users can access their data Mechanism should exist to release into Production – if required Sandboxes are not Production; but rather a pathway to production Sandboxes are used as design, not as code
Nursery Method Planning
Differentiate between types of changes – one size does not fit all. Determines how many Cycles it should stay in the Nursery. Minor Changes to Reports and Semantic Layer Category 1 – Changes to pre-canned reports / extracts Do not require changes to Semantic layer Category 2 – Deployment to live of new reports created by information analysts. Category 3 – Simple changes to the Semantic layer. New Reports Category 4 – Creation of new reports / extracts. Changes Impacting semantic layer Category 5 – Other changes to the semantic layer. creation of new derived fields (not to be performed in the universe). Category 6 – Changes to pre-canned reports / extracts that require changes to semantic layer. Category 7 – Creation of new semantic later. Nursery Method Exploitation – Managing “live” changes
Mission Critical Adaptability “Pace of change” – keeps increasing … Its all about speed Speed of change Speed of information access “Design for change” – as opposed to “built to last” Design to: Build Once – Use Many Enter “Business Rule Management” (BRM) Process – Business Process Management (BPM). Rules – Decision logic Data – Decision variables ProcessRules Data
Mission Critical Adaptability Design for change Process – Business Process Management for operational decision support Process flow or workflow for tactical / strategic decision support Rules – Rules Drive the Process Declarative approach Business user managed Descriptive Data – Meta/Reference data Enforces the Rules Thus data Drives the Process Contextual Volatile Flexible ProcessRules Data
Mission Critical Adaptability Examples of rules management …
Operational Principles Flexibility Users require “flexibility” without the need to re-develop. Need to be able to Add and/or Modify Load Process Application processing Error processing Validations Recipients of Load statistics (DQ, Errors, etc) Encryption Process Load and use new data (joined to existing data) As and when they want to Without new code !!!
Operational team require the ability to configure and monitor processes. View ETL progress (real time) Loads Load steps Load Statistics Reporting and tracking by: Load Business Unit Time Status Performance and statistic reporting. Operational Principles Maintenance Error tracking & maintenance against Load Control Loads if needed Start (automatically & manually) Hold/Pause all or part of a load(s) Stop Loads Restartable (from where needed)
System should output meaningful & understood Error messages. Specific Messages throughout application, so business know the area. Visibility of Operations Error maintenance. Ability to feed into process Operational Principles Administration Statistical Real-time reporting & tracking of loads. Know what data has been loaded Know how much data has been loaded Know what stage each load is at. Know what business units have loaded data. Business require Knowledge
Business & Operations require A robust & resilient system Loads may be automatically restarted from where they were stopped/failed (as required) Each load job, step and statistic has start/end times and status ETL checks status of job to determine if it needs to/can be run. Fatal errors need manual intervention before they may be rerun. Performance and statistic reporting Self initiating Loads Operational Principles Resilience
How? Where can MDP help your DWH? What Metadata does MDP need? Feed MDP into Development stream? Educate developers to use it Educate user to request it. Educate the business to use it. Operational Principles Summary Data Warehouses require “Metadata Driven Processing” (MDP) What can be MDP and what can’t? Loading Data – Types of loads, Source to target Load Control – Starting, stopping, branching, etc Errors & Messages – effects of & reporting on, Validation (DQ) – how, what, when & reports Encryption – how, what & when Reverence Data Processing
Metadata Driven Processing Enterprise Warehouse Operational Components (EWOC) The Concept Instance of Job Load Step Load Statistics Validation Outcome Business Unit Message Work-Flow Severity Project Data Integration & Quality Team? Application Users Admin Validation Rules
… Job Collection of Steps Has a start and an End Job Step Get data Load staging Load Atomic Human Interaction Etc. Source SUS Cancer Registry Internal Target Internal BO / OBI Type CSV File XML Table Report/Extract Validation Lookups Static values Data Quality Patterns Linkage Man/Ops Etc Business Unit BU Job BU Job Step Schema Source Schema Target BU Validation Additional Less the non-mandatory Infrastructure Storage Allocation CPU Allocation Memory Allocation Sand Pit Schemas Message Validation Load Processing Severity Fatal Error Warning Information Cause & Solution Project Metadata Driven Processing The Metadata Driven ETL
Job Job Step SourceTarget Type CSV File XML Table Report/Extract Validation Lookups Static values Data Quality Patterns Linkage Man/Ops Etc Metadata Driven Processing (MDP) Definition of Jobs Loads are specific instances of a Job Build re-usable modules Metadata driven code, promote MDP Quicker time to delivery, develop and test once Add/Change source and target by changing MDP data Add/Change ETL by changing MDP data Pick Lists Defined by Reference data Examples: Date range validation Foreign Key Lookups Mandatory / Optional dd-mm-yyyy vs. yyyy/mm/dd Y/N vs. 1/0 Metadata Driven Processing The Jobs - ETL
Message Validation Load Processing Severity Fatal Error Warning Information Cause & Solution Fatal – Fails the load Invalid file format Error – Load keeps going Max number of errors? % of load rather than # Warning – not following rules Date format etc. Information – no affect on load Dates out of range Visit after treatment Metadata Driven Processing The Messages – Driving force Helps with future occurrences Updated & Maintained Usage Error reporting Textual objects Information Messages Load Reporting Load Control Supports MDP Feeds Metadata Driven ETL Should be used throughout ETL Failure Checks/Traps Exceptions Reporting (DQ & Validation) Each error/trap/exception has a unique Message ID Headings/Titles/Text Severity can be changed Changes processing when changed Message Grouping
Metadata Driven Processing Data Quality & Linkage Validation Outcome Reports Business Unit Canadian Office Finish Office UK Office Reports Data Integration & Quality Team? Supports MDP Key in any system, but more so in a MC one. Use Metadata to Drive process Important right people get right data Quickly Rules Based Validation Data Quality Validation Linkage Validation New rules can be added/removed When needed(no code required) Businesses users decide to add rules From pick list Defined using building blocks Severity of failure of rule can be changed When needed(no code required) Businesses users decide severity Validation Rules Metadata Driven ETL Audit Data Lookups Static values Range Conversions Patterns Linkage Man/Ops Etc
Metadata Driven Processing Encryption Encryption Type Target Data Source & Target Definition Parameters (keys) Column Type Metadata Driven ETL Supports MDP Encryption is simply a specific Instance of a Job Built to perform Encryption New Encryption Types can be added but do require code New columns to be encrypted can be added by simply adding metadata, no code. Keys can be stored or added at run-time AES128 Triple DES Look-up Home-Grown? Name DoB ID # Source Data Audit Data
Metadata Driven Processing Reference Data Management Reference Table Definitions Column Definitions Metadata Driven ETL Business Unit Import Types Source Definitions Source Attribute Definitions BU Sources Target Data Source Data Audit Data Supports MDP New reference data can be added without new code Different BUs can have different data but though same RDMT Different Import types are catered for Different Table Types are catered for Table Types e.g. CSV, XML, Excele.g. K-Type 1, 2 & 3, Home grown, etc.
Metadata Driven Processing The Metadata Model
Extending the Mission Critical Data Warehouse. Most BI/DW requirements are not green field. Extending existing is a key design objective. Build Once – Use Many Adding new data sources Change existing data sources Data linage - Metadata Where data has come from Where it has gone What has happened to it along the way Impact Analysis New exploitation (analysis and reporting) of existing DW Adding new exploitation capabilities to DW Metadata Driven Processing Extensibility Audit Data
More building blocks
Technology Drivers Examples of technology features supporting Mission Critical BI. Analytics outside Data warehouse BI Web Services High Availability Data Warehousing Real-Time Data Warehousing Master Data Management (MDM) From “TDWI Best Practice Report, Next Generation Data Warehouse Platforms, By Philip Russom”.
Mission Critical Performance Leaving the Nursery (or Sandbox) Productionise the code Performance!! Balance Brute force – MPP (medium to high volumes / complexity / users) SMP (low volume / complexity / users) Performance Layer BI tool and RDBMS calibration Speed of ETL vs. Need of Retrieval - when to do something and when to not. 80 – 20 rule Selective Denormalisation Selective Pre-Joins Aggregates and Summaries – are they always needed DWA no?, SMP yes? OLAP Performance metadata Row counts Elapsed time
Mission Critical Administration Not all BI is mission critical – phew! Prioritise resources for Mission Critical BI Applications Back office workload Resources Management
Information Lifecycle Management Not all information is mission critical – phew! Many benefits to segmenting information by its usefulness to the business. Performance / Throughput Cost effective Prioritisation of resources ILM - Number of levels 1. Separate active and non active data. 2. Compression non volatile data 3. Read only for historic 4. ILM - Intelligent storage based on usage of information. Automation is a key (emerging) requirement for supporting MCBI.
Mission Critical Security Security includes … Business Continuity Confidentiality Information Classification Non Repudiation Privacy Apply principle of “defence in depth” with multiple layers relating to security of information. Protecting customer identifying information. Pseudonymisation (P14n) Anonymisation Linkage across datasets and over time but NOT customer identifying. Usable Audit Services: provision of audit trail for Transactions applied to the database. Access to data in the database.
Mission Critical Security Pseudonymisation (P14n) Encryption Reversible Non Reversible Substitution Surrogates Anonymisation Other considerations Harvesting / Sharing Usability of output Key destruction
Mission Critical Infrastructure Mission critical infrastructure requirements Availability & Resilience Capacity on demand Ease of management Linear Scalability Data warehouse infrastructure “Roll your own” data warehouses Declining … Data warehouse appliance (DWA) The “new” kid on the block Cloud Services Way of the future?
Mission Critical – Maximum Availability Data warehouse now have to meet following with NO downtime. Planned Outages System Changes Application Changes Migrations / Transitions Unplanned Outages Infrastructure Failures Data Issues Human Error Degraded Service Insufficient Capacity Workload Management
Mission Critical – Maximum Availability Requirements Measured in 9’s No single point of failure. Tolerates many outages transparently Straightforward administration Availability and Resilience Active / Standby Active / Passive Dual Active Fallback Backup and recovery Automation Hot vs. Cold Incremental vs. Full Second site Software OperationalNetwork Hardware
Mission Critical Service Availability Data Migrations New requirements – No downtime for on boarding data or exploitation. No impact to data freshness. Minimise impact on existing system. Differentiate between Migrations of new data source Migrations for existing subject areas (more common) Phased data migrations. Emerging Integration patterns Green field data migration Parallel Trickle data migrations. Mini batch data migrations
Mission Critical Data Migrations Independent data migration of (new) data source. Partition data migration in order to batch / trickle. Impact volumes against pattern to understand impact of additional throughput. Resource management a key requirement to protect existing system. No downtime or data freshness impact on business. Original structures New structures ETL Data Migration Green field Mini batch Or Trickle 1 2 ETL 3
Mission Critical Data Migrations Concurrent maintenance of new and old structures. Cut over on completion of data migration to new structures. Impact volumes against pattern to understand impact of additional throughput. Failure to either new or original structures must result in rollback of both. No downtime or data freshness impact on business. Original structures New structures ETL Data Migration Parallel Trickle Pattern Trickle 1 2
Mission Critical Data Migrations ETL Maintenance at single data structure at any point in time. Logically segment the source data into discrete partitions. Execute mini batch migrations, focusing on each partition in turn. Partition on volatility with early phases based on least volatile data. Catch-up mini batches required for changes during transition before final cut over. No downtime or data freshness impact on business. Original structures New structures ETL Data Migration Mini Batch Pattern 1 2 3 Mini batches
Mission Critical Data Migrations Pre-requisites Data profiling and analysis of new / changes in data migration Up front planning for Pipe cleaning and Rehearsal Practically Selective Only select entities you know you will need in that phase. If your hitting an entity consider taking it all. Transition – Fail to plan is plan to fail! rehearsal is key. Rolling Data quality monitors Audit and Reconciliation
Summary Mission Critical is here … What we need is an “Intelligent Data warehouse” Metadata driven Build once – use many Why do we need it? Business Agility through Nursery Method – Facilitates “pace of change” of business. Protects existing Mission Critical BI Services. Operational patterns Empower the business Support the Mission Critical BI Services. Integrated – exploitation of the customer “360 view” Secure – ensuring the right information to the right person
References Massive But Agile: Best Practices For Scaling - The Next-Generation Enterprise Data Warehouse, Forrester. TDWI Best Practice Report, Next Generation Data Warehouse Platforms, Philip Russom. The ETL Toolkit, Ralph Kimball. Smart (Enough) Systems, James Taylor. Best Practices Mitigate Data Migration Risks and Challenges, Gartner.
Questions Thank you Further queries contact us at:- Jason.Perkins@ewoc.info Will.OShea@ewoc.info http://www.ewoc.info/