Navigator Management Partners LLC Business Analysis Professional Development Day – Sep 2014 How to understand and deliver requirements to your Business.

Slides:



Advertisements
Similar presentations
Tips and Tricks for Dimensional Modeling
Advertisements

Data Manager Business Intelligence Solutions. Data Mart and Data Warehouse Data Warehouse Architecture Dimensional Data Structure Extract, transform and.
Introduction to Databases
Management Information Systems, Sixth Edition
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
© Copyright 2011 John Wiley & Sons, Inc.
Data Warehouse IMS5024 – presented by Eder Tsang.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
Chapter 3 Database Management
File Systems and Databases
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
MIS 451 Building Business Intelligence Systems Logical Design (3) – Design Multiple-fact Dimensional Model.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
ETL Design and Development Michael A. Fudge, Jr.
ETL By Dr. Gabriel.
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
Business Intelligence
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
CIS 429—Chapter 8 Accessing Organizational Information—Data Warehouse.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Understanding Data Warehousing
Database Systems – Data Warehousing
Systems analysis and design, 6th edition Dennis, wixom, and roth
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
1 The following presentation is from the Oracle Webcast “What’s New in P6 EPPM Release 8.1.” As a partner, you may not use the Oracle Power Point template,
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Emerging Technologies Work Group Master Data Management (MDM) in the Public Sector Don Hoag Manager.
Pierre-Louis Usselmann, Ben Watt SOGETI Switzerland Master Data Services.
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
CS 157B: Database Management Systems II March 20 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
1 Data Warehouses BUAD/American University Data Warehouses.
Right In Time Presented By: Maria Baron Written By: Rajesh Gadodia
ETL Extract. Design Logical before Physical Have a plan Identify Data source candidates Analyze source systems with data- profiling tools Receive walk-through.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
Datawarehouse A sneak preview. 2 Data Warehouse Approach An old idea with a new interest: Cheap Computing Power Special Purpose Hardware New Data Structures.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
Database Management Systems (DBMS)
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
7 Strategies for Extracting, Transforming, and Loading.
Two-Tier DW Architecture. Three-Tier DW Architecture.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
DBS201: Data Modeling. Agenda Data Modeling Types of Models Entity Relationship Model.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Chapter 8: Data Warehousing. Data Warehouse Defined A physical repository where relational data are specially organized to provide enterprise- wide, cleansed.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Bartek Doruch, Managing Partner, Kamil Karbowiak, Managing Partner, Using Power BI in a Corporate.
CHAPTER SIX DATA Business Intelligence
Chapter 13 Business Intelligence and Data Warehouses
Data storage is growing Future Prediction through historical data
Data Warehouse.
OLAP Systems versus Statistical Databases
Star Schema.
CMPE 226 Database Systems April 11 Class Meeting
MANAGING DATA RESOURCES
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Data Warehousing.
Presentation transcript:

Navigator Management Partners LLC Business Analysis Professional Development Day – Sep 2014 How to understand and deliver requirements to your Business Intelligence Peers Neelam Mohanty

TYPICAL DATA WAREHOUSING NEEDS  Identify when changes are made to the source systems  Uniquely identify a source record  Be usable – allow users to drill up, drill down or across  Handle overlapping data from multiple systems  Allocate header level facts to detail line-item data  Handle semantic complexity  Eliminate mismatches when integrating data from multiple sources  Store historical and current data

REQUIREMENTS FROM SOURCE SYSTEMS  Change Data Capture (CDC)  Natural Key  Data Quality  Hierarchies  Grain  History  Conflict Resolution  Semantic Complexity

CHANGE DATA CAPTURE (CDC)  Why the need to identify changes?  Data Warehouses need to identify slowly changing dimensions  Master Data Management (MDM) needs to identify when changes were made to master data attributes  Capture changes as they happen in source systems for improving Data Quality  Data warehouses want to transfer only the relevant changes to the source data since the last transfer (also called incremental loading). Doing a ‘full refresh’ is usually undesirable other than for smaller tables  Capture deletions, edits and inserts

CHANGE DATA CAPTURE (CDC)  Why is CDC important?  Reduces load time, required resources and associated costs  Provides a solution for the continued and accelerating growth in data volumes  Supports lower delivery latencies for real-time data warehousing  Mitigates the risk of failure in long-running ETL jobs.  Four ways of implementing CDC:  Database Log Scraping  Audit Columns  Timed Extracts  Full database “diff compare”

UNIQUE IDENTIFIER/NATURAL KEY  Why is a unique identifier important?  Data Warehouses typically store years of history – the source system may only store the last three months of history and reuse the unique identifier  Data Warehouses typically need to integrate multiple source systems  Knowing a birthdate, gender and ZIP code is enough to identify 87 percent of the people living in the United States  Fuzzy vs Exact matches

DATA QUALITY  Why does it happen?  Bad businesses processes  Inadequate software training  Bad software design and implementation  Consequences on BI  Bad or delayed decisions  Revenue Impacts  Money spent in standardizing or cleansing data downstream  Affects ability to reach customers, cross-sell and up-sell  Efficiency Impacts  Time and resources spent on fixing data quality issues

HIERARCHIES  A hierarchy is a set of levels having many-to-one relationships between each other, and the set of levels collectively makes up a dimension or a tree.many-to-one relationships dimension  Why important?  Usability - Hierarchies are the key to navigating dimensions  Query Performance  Level  A position in a hierarchy  Hierarchies are not always officially part of the source system

THE HOLY GRAIN(L)  Most of the basic control documents that transfer money and goods (or services) from place to place take the parent child form.  An invoice (the parent) has many line items (the children).  Other obvious examples besides invoices include orders, bills of lading, insurance policies, and retail sales tickets.  Parent-child transaction databases commonly have facts of different granularity  Data Warehouses often try to allocate header level facts down to the child line item detail level  This provides the ability to slice and dice and roll up the facts to the child line-item detail

HISTORY  Data Warehouses typically need to do a one-time history load when they go live  Also called initial load or initial/seeding load  The data in the operational systems may often be in multiple formats  It is believed that data quality gets poorer and poorer as the data gets older  Missing data value issues often must be dealt with when loading history

SURVIVORSHIP (CONFLICT RESOLUTION)  Overlapping data from multiple systems  Survivorship (Conflict resolution) – Rules to define how to assemble a single record from two or more records with overlapping attributes that may contain conflicting values  Resolution strategies can be  Instance based – rely on the actual data values  Deciding versus Mediating strategies  Metadata based – rely on freshness of data or reliability of source or which data element was most recently updated

SEMANTIC COMPLEXITY  Different users of a database have different conceptions of what the data represents  Take the example of a table of mobile phone numbers – there are values of all 0’s, nulls, all 9’s and valid digits  What does null mean?  The record is of someone who does not have a mobile phone  The record is of someone who has a mobile phone, and chose not to supply the number  The record is of someone who has a mobile phone, but forgot to supply the number or the number was hard to decipher and recorded as null  What do the all 0’s and all 9’s mean?

QUESTIONS?  Contact me at

CHANGE DATA CAPTURE (CDC)  Why the need to identify changes?  Data Warehouses need to identify slowly changing dimensions  Master Data Management (MDM) needs to identify when changes were made to master data attributes  Capture changes as they happen in source systems for improving Data Quality  Data warehouses want to transfer only the relevant changes to the source data since the last transfer (also called incremental loading). Doing a ‘full refresh’ is usually undesirable other than for smaller tables  Capture deletions, edits and inserts

CHANGE DATA CAPTURE (CDC)  Why is CDC important?  Reduces load time, required resources and associated costs  Provides a solution for the continued and accelerating growth in data volumes  Supports lower delivery latencies for real-time data warehousing  Mitigates the risk of failure in long-running ETL jobs.  Four ways of implementing CDC:  Database Log Scraping  Audit Columns  Timed Extracts  Full database “diff compare”

UNIQUE IDENTIFIER/NATURAL KEY  Why is a unique identifier important?  Data Warehouses typically store years of history – the source system may only store the last three months of history and reuse the unique identifier  Data Warehouses typically need to integrate multiple source systems  Knowing a birthdate, gender and ZIP code is enough to identify 87 percent of the people living in the United States  Fuzzy vs Exact matches

SURVIVORSHIP (CONFLICT RESOLUTION)  Overlapping data from multiple systems  Survivorship (Conflict resolution) – Rules to define how to assemble a single record from two or more records with overlapping attributes that may contain conflicting values  Resolution strategies can be  Instance based – rely on the actual data values  Deciding versus Mediating strategies  Metadata based – rely on freshness of data or reliability of source or which data element was most recently updated

HIERARCHIES  A hierarchy is a set of levels having many-to-one relationships between each other, and the set of levels collectively makes up a dimension or a tree.many-to-one relationships dimension  Why important?  Usability - Hierarchies are the key to navigating dimensions  Query Performance  Level  A position in a hierarchy  Hierarchies are not always officially part of the source system

THE HOLY GRAIN(L)  Most of the basic control documents that transfer money and goods (or services) from place to place take the parent child form.  An invoice (the parent) has many line items (the children).  Other obvious examples besides invoices include orders, bills of lading, insurance policies, and retail sales tickets.  Parent-child transaction databases commonly have facts of different granularity  Data Warehouses often try to allocate header level facts down to the child line item detail level  This provides the ability to slice and dice and roll up the facts to the child line-item detail

DATA QUALITY  Why does it happen?  Bad businesses processes  Inadequate software training  Bad software design and implementation  Consequences on BI  Bad or delayed decisions  Revenue Impacts  Money spent in standardizing or cleansing data downstream  Affects ability to reach customers, cross-sell and up-sell  Efficiency Impacts  Time and resources spent on fixing data quality issues

HISTORY  Data Warehouses typically need to do a one-time history load when they go live  Also called initial load or initial/seeding load  The data in the operational systems may often be in multiple formats  It is believed that data quality gets poorer and poorer as the data gets older  Missing data value issues often must be dealt with when loading history

SEMANTIC COMPLEXITY  Different users of a database have different conceptions of what the data represents  Take the example of a table of mobile phone numbers – there are values of all 0’s, nulls, all 9’s and valid digits  What does null mean?  The record is of someone who does not have a mobile phone  The record is of someone who has a mobile phone, and chose not to supply the number  The record is of someone who has a mobile phone, but forgot to supply the number or the number was hard to decipher and recorded as null  What do the all 0’s and all 9’s mean?

QUESTIONS?  Contact me at