Dimensional Modeling Dr. Jerry Rosenbaum The Rose Tree Group

Slides:



Advertisements
Similar presentations
Jack Jedwab Association for Canadian Studies September 27 th, 2008 Canadian Post Olympic Survey.
Advertisements

Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.
Symantec 2010 Windows 7 Migration Global Results.
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
AP STUDY SESSION 2.
1
Information Retrieval from Relational Databases
David Burdett May 11, 2004 Package Binding for WS CDL.
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
Create an Application Title 1Y - Youth Chapter 5.
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
1 Advanced Tools for Account Searches and Portfolios Dawn Gamache Cindy Bylander.
The 5S numbers game..
1. Bryan Dreiling Main Contact for Three Year Plans
© Tally Solutions Pvt. Ltd. All Rights Reserved Shoper 9 License Management December 09.
Inspections on an iPad, iPhone, iPod Touch, Android Tablet or Android Phone.
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Media-Monitoring Final Report April - May 2010 News.
Welcome. © 2008 ADP, Inc. 2 Overview A Look at the Web Site Question and Answer Session Agenda.
Break Time Remaining 10:00.
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
Information Systems Today: Managing in the Digital World
Database Performance Tuning and Query Optimization
PP Test Review Sections 6-1 to 6-6
Employee & Manager Self Service Overview
1 IMDS Tutorial Integrated Microarray Database System.
Normalization of Database Tables
Briana B. Morrison Adapted from William Collins
Microsoft Confidential. We look at the world... with our own eyes...
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Regression with Panel Data
1 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. An Introduction to Data.
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Biology 2 Plant Kingdom Identification Test Review.
Adding Up In Chunks.
FAFSA on the Web Preview Presentation December 2013.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Facebook Pages 101: Your Organization’s Foothold on the Social Web A Volunteer Leader Webinar Sponsored by CACO December 1, 2010 Andrew Gossen, Senior.
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Artificial Intelligence
When you see… Find the zeros You think….
GEtServices Services Training For Suppliers Requests/Proposals.
Before Between After.
1 Lab 17-1 ONLINE LESSON. 2 If viewing this lesson in Powerpoint Use down or up arrows to navigate.
Copyright 2001 Advanced Strategies, Inc. 1 Data Bridging An Overview Prepared for DIGIT By Advanced Strategies, Inc.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 12 View Design and Integration.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Clock will move after 1 minute
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 9 TCP/IP Protocol Suite and IP Addressing.
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Employment Ontario Literacy and Basic Skills Performance Management Reports Training For Service Providers.
Chapter 13 The Data Warehouse
Select a time to count down from the clock above
Import Tracking and Landed Cost Processing An Enhancement For AS/400 DMAS from  Copyright I/O International, 2001, 2005, 2008, 2012 Skip Intro Version.
1.step PMIT start + initial project data input Concept Concept.
9. Two Functions of Two Random Variables
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
Outlook 2013 Web App (OWA) User Guide Durham Technical Community College.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
© 2007 by Prentice Hall Management Information Systems, 10/e Raymond McLeod and George Schell 1 Management Information Systems, 10/e Raymond McLeod Jr.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Presentation transcript:

Dr. Jerry Rosenbaum jrosenba@ix.netcom.com Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

Agenda Dimensional Example The Bigger Picture Steps to Build a Dimensional Model

An Understanding of the Problem is key to the solution To rewire the Empire State Building you must Understand the current wiring Understand the goals for the new wiring Design the new wiring system Execute your design Monitor the results The “Fire, Ready, Aim” approach can lead to a hole in your foot

Business User Perspective on Data In the view of the business user, there are only two important things about data Accessibility Can I get the data that I need Am I allowed to get the data I need (security) Quality When I get the data, can I trust it Without quality, business intelligence devolves into business stupidity

Part I Dimensional Model Example

Up Front Points Dimensional Models are generally a star schema, but may be a snow flake Provides a slice of the total available data and is focused about the needs of a single department or user Contains a fact table and multiple dimension tables Attributes may be source data or derived data A dimensional model is one type of data mart.

Sales Analysis Star Schema From Len Silverston Universal Data Models

Dimensional Model One Fact table Customer Sales Fact is either one data item (or a group of tightly coupled data items of the same granularity) Multiple Dimension Tables (each with one or more dimensions of a similar type) Sales Rep Product Time by Day Address Customer Customer Demographics Internal Organization

Business Points Data in Fact Table, and the Dimension Tables is based on Business requirements Data extracted from other databases Internal or external data Allows business users to slice and dice the data by any combination of dimensions to produce a business report

Business Points (2) Design of the Dimensional Model is only one aspect, how about the data to be loaded Are there any data quality issues with the data that will be loaded Are there alternate sources of the data that differ from the data source we used Are we violating any security or privacy issues

Questions Where did the data come from How often do we update the data Is it an update or a complete refresh Are the rules different for internal and external data sources Why didn’t we just use the original databases Was any of the data transformed Why did we choose those 7 dimensions What is the quality of the data

Truly Answering These Questions We must look at Business drivers Where dimensional modeling fits into the bigger data picture How do we do dimensional modeling In other words, it helps if we understand the bigger picture so we can build the right solution the first time

Some Hidden Issues This dimensional model can not reasonably help us answer What was the typical total check out amount for all purchases by a customer How many items did a typical customer purchase Adding these items to every row of the fact table is not a good solution This requires a second, but related dimensional model.

Hidden Issues (2) How does one keep a set of dimensional models in synch How does one ensure that for a given “fact” and set of dimensions that the sourcing of the data is consistent across multiple models If a transformation (the T in ETL) is used, how can one ensure that everyone uses a consistent transformation What about the data quality issues – were they resolved the same way every time.

Part II The Bigger Picture

Sagely Advice “Where should I begin your majesty” “Begin at the beginning”, the king said gravely - Louis Carroll Alice’s Adventure in Wonderland

Business Intelligence On of the earliest applications in the history of computing was a program to generate reports on operating system performance FAST FORWARD TO TODAY Today we call this process Business Intelligence (BI) – we gather and analyze data to increase business process efficiency. Realize even the first application was written in the early 70s’ to provide intelligence to improve system performance. So as we fast forward to today ----- We are still providing reports for the reason of improving a result, whether it is a business decisions or the performance of the system in a data center. So we must assume the reports we provide must have a positive end result with a high degree of confidence in the accuracy of the data being manipulated. From J. O’Conner

Business Intelligence C - Level – Considerations Costly ERP and Major systems are implemented to provide information to management to make the best decisions relating to improving the “Bottom Line.” IT Customer satisfaction does not seem to meet expectations – manage expectations IT takes the blame for bad decisions due to the information available to the executives UNWARRANTED ??? – Corporations spend large amounts of money, time and numerous process changes with the expectation of saving money, improving productivity and gathering information for improved decision making and possessing the information for strategic direction and tactical improvements. Information Technology has been the “whipping boy” for poor business decisions, increased cost of production, and loss of cost savings – even to being responsible for the systems failures to meet expectations. Many times IT does not even participate in the product selection, but are all of a sudden responsible for the results, not just the implementation and training. Most times IT does not have the proper level of responsibility at the executive level who defend the fact IT cannot do any more than provide the best business intelligence possible and keep the system functioning – but we cannot control the business’s . However, the information and reporting of Business Intelligence must meet the needs and expectations of the customer. So is the blame unwarranted? This is our fault – we do not manage expectations or listen to the customer – which is the most common problem because we do not train our customers about what information is available and if it is not at this time how can we provide a solution to meet the expectation. From J. O’Conner

Business Intelligence C - Level Executive Considerations ERP or Major System’s Value is dependent on how the information is disseminated to management (regions, functions, divisions, products, etc.) – but IT needs to understand what data is available and assure its accuracy. IT must establish test and check points to assure data accuracy using standard data integrity methods. IT must provide training and support in development of reports for accuracy – become detectives – look for anomalies, bringing them to the attention of the clients. One thing to remember about the Executive management – they have already “selected” the best system (in their opinion). They do not care about how query requirements work – joining, filtering, grouping or special calculations must be processed. Therefore, we need to train their staff to query and utilize the end user tools to produce the reports and information, but we also need to have a check and balance to assure the information is accurate, has been processed properly (review the report structure) and test all output for accuracy with specific test points and check points. REMEMBER: It must take a leadership role in investigating any anomalies because the end user or client cannot be wrong it is the system. Story, For a considerable time the businesses and Finance would state that the information coming from the SAP system was incorrect. After some basic investigation it was found that Finance had a consolidating system that took the data from all of the SAP businesses and created the month end reports. They also made adjustments in the consolidating system that did NOT coincide with the business rules established by Finance when the system was implemented. To make it worse they made the changes in the system at the summary level and in their system so two things were wrong (1) The consolidating system did not “roll up” from the details and (2) they did not make the changes to the SAP system. So over time the error compounded itself, to a point that when SOX404 was being implemented the correcting and need to improve the check points and remediation was significant. So create a data integrity team that is cross functional and review the business intelligence for obvious issues as well as those that are less obvious. SOX404 will help with a Control Self Assessment that should be adhered to by IT to assure the Business Intelligence is accurate and avoid the “finger pointing” as well as being the reason for business failures. From J. O’Conner

Business Intelligence C - Level Executive Considerations Information Business Leaders need: Customer Information – trends in product selection, billing management, order tracking, fulfillment visibility, marketing planning, campaign management , telemarketing, lead generation, lead generation, and custom segmentation. One thing to remember about the Executive management – they have already “selected” the best system (in their opinion). They do not care about how query requirements work – joining, filtering, grouping or special calculations must be processed. Therefore, we need to train their staff to query and utilize the end user tools to produce the reports and information, but we also need to have a check and balance to assure the information is accurate, has been processed properly (review the report structure) and test all output for accuracy with specific test points and check points. REMEMBER: Faced with the business demand for sustained profitability, companies of all sizes are looking for sourcing and procurement – and the bottom line advantage of managing supplier spends, reduce the cost of purchased goods and services all can boost profitability. The Supplier trends – vendor to products, delivery trend (timely), - basically provide the business intelligence to control the complete supply cycle – from strategy to execution by optimizing supplier selection, compressing cycle times, and building sustainable workable supplier relations. The customer trends – purchasing habits, product selection, delivery method, and payment trends. From J. O’Conner

Business Intelligence C - Level Executive Considerations Information Business Leaders need: Supplier Information – cost of purchased goods and services, optimization of supplier selection, compress cycle times, align the purchase of goods with the corporate strategy. One thing to remember about the Executive management – they have already “selected” the best system (in their opinion). They do not care about how query requirements work – joining, filtering, grouping or special calculations must be processed. Therefore, we need to train their staff to query and utilize the end user tools to produce the reports and information, but we also need to have a check and balance to assure the information is accurate, has been processed properly (review the report structure) and test all output for accuracy with specific test points and check points. REMEMBER: Faced with the business demand for sustained profitability, companies of all sizes are looking for sourcing and procurement – and the bottom line advantage of managing supplier spends, reduce the cost of purchased goods and services all can boost profitability. The Supplier trends – vendor to products, delivery trend (timely), - basically provide the business intelligence to control the complete supply cycle – from strategy to execution by optimizing supplier selection, compressing cycle times, and building sustainable workable supplier relations. The customer trends – purchasing habits, product selection, delivery method, and payment trends. From J. O’Conner

Business Intelligence C - Level Executive Considerations Information Business Leaders need: Product Lifecycle Management data Supply Chain Management data Financial Data that links Business controls to Finance and comply with Sarbanes Oxley (SOX404). One thing to remember about the Executive management – they have already “selected” the best system (in their opinion). They do not care about how query requirements work – joining, filtering, grouping or special calculations must be processed. Therefore, we need to train their staff to query and utilize the end user tools to produce the reports and information, but we also need to have a check and balance to assure the information is accurate, has been processed properly (review the report structure) and test all output for accuracy with specific test points and check points. REMEMBER: Faced with the business demand for sustained profitability, companies of all sizes are looking for sourcing and procurement – and the bottom line advantage of managing supplier spends, reduce the cost of purchased goods and services all can boost profitability. The Supplier trends – vendor to products, delivery trend (timely), - basically provide the business intelligence to control the complete supply cycle – from strategy to execution by optimizing supplier selection, compressing cycle times, and building sustainable workable supplier relations. The customer trends – purchasing habits, product selection, delivery method, and payment trends. From J. O’Conner

Key Business Driver: The Need to Improve Business Intelligence Nine out of 10 executives from the largest U.S. companies say they need stronger business intelligence capabilities that provide better analysis of, and insight into, their operations if they are to grow successfully in an uncertain economic and political environment Accenture survey of 150 senior executives of Fortune 1000 firms. From J. O’Conner

Top Needs 91% selected stronger analytical and business intelligence 84% selected an organizational culture that better accommodates change 74% selected a more robust information technology infrastructure

Building a Business and IT Foundation for BI Organizations have never been so eager to adopt business intelligence (BI) technology. Unfortunately, lack of alignment between people, process, and technology has led to many misguided business intelligence deployments. Using a business-centric methodology and process improvement type of approach, organizations can leverage BI efficiently to enable Performance Management M.A.Smith – Data Management Review

Zachman Framework

DoDAF

Row 1 - Planner Data – What List of Subject Areas about which data is stored Often presented as a taxonomy tree or a multilevel outline Process - How Business functional areas Often presented as a set of functional decompositions Also important to know Relationship between data and process Present, and proposed data and process Transition plan / Road map

Row 2 – Business View/Owner What - Data Conceptual (or Business) Data Model. Has two components Business data objects (forms, reports, etc) and their decomposition into data components and A high level organization of the business data components and their relationships How - System Business work flow End to end (incl. people) Includes all aspects of the target system The business workflow is the structure for organizing the business steps There are rules for moving along the steps in the workflow The business steps communicate to each other via the data

Row 3 – Logical View / Designer What - Data Logical Data Model Fully normalized (through fifth normal form) Determine which services use which data elements (entity level and attribute level) How - System For each business object Develop a set of services to meet the business need Search for potential common services User interactions Metadata for both data and systems should be collected, Organized, and maintained

What is in a Logical Data Model Graphic that depicts entities, attributes, primary keys, etc (Data items + structural rules + relationships) Plus Metadata for Entities, Attributes and Relationships (CRUCIAL FOR USING DATA MARTS) Definition Data Domain values set (including possible representation) Units of Measure Cardinality (and optionality) Management of synonyms and antonyms Semantic rules Status of an entity, or attribute

Logical Data Model (2) Primary key, foreign key, uniqueness, use of nulls and default values Data integrity and business rules (Data Quality Rules) Originating Data Source System of Record/Authoritative Source Lead and steward business domain Usage of data in an information exchange Security Notes for physical data model designer Example A data modeler may use abstract design templates (generalization and specialization) and bottom up design based current systems plus new requirements to build a complete model All other models (especially the process models) are used as potential sources of data elements

Some Data Model Notes The Conceptual Data Model tends to be a very wide scope, but limited detail (sets the context for data sharing). Logical Data Model is developed in detail as needed Physical Data Model is based on all or part of the Logical Data Model AND it may have a similar or very different data structure Structure is based on planned usage

Row 4 – Physical View/Builder What - Data Physical Database Design Transaction Path Analysis Analysis of the need for indexes to improve performance Determine Physical Data Structure Determine if any services will be stored procedures How - System Transformation of the business flow to a physical flow Determine groupings of services for implementation Build physical flow based on business flow and services SOA Add physical details to the metadata and discovery services

Row 5 – Detail Representation / Subcontractor What - Data Determine the layout of the database tables across the disk farm Develop the DCL for the physical database structure Determine backup & recovery strategy Determine SAN strategy How - System Determine detail specifications for each element of the physical flow Write the programs for implementing the flow as well as each element of the flow

Row 6 – Information System (Actual physical system What Performance monitoring and adjustments Business continuity Archiving and retrieval SPC & Audit Data Stewardship How “Help Desk” Change management (including data)

Corporate Information Factory Metadata Operational Systems Reporting & BI Systems AR Data Mart for Accounting ETL ETL AP Data Warehouse Data Mart for Sales Order Entry Etcetera Etcetera System Measurements Based on work of C. Imhof

Data Quality Points You must measure your actual data quality Quality must start in the production systems Your ETL processes along with the data warehouse and data marts are not meant to be a sewerage treatment plant for bad data If the quality is not very high in the operational systems, then Your quest for business intelligence devolves into producing business stupidity Solve your DQ problem in the operational systems (root cause) and then go forward. Bad data can be created faster than you can correct it.

System Measures for Data and Systems Frequency of use Pattern of use (monthly, weekly, daily, hourly) Resources consumed CPU, Disk, Network For Data both volume and rate of growth Performance Metrics Utilization Metrics

Metadata for Data and Systems Need Metadata about Data and Systems Written in Business Terms + Technical Terms Metadata for data includes information about Entities, Attributes, and Relationships Definition Data Domain values set (including possible representation) Units of Measure Cardinality (and optionality)

Metadata (2) Management of synonyms and antonyms Semantic rules Status of an Entity and Attribute Primary key, foreign key, uniqueness, use of nulls and default values Data integrity and business rules Originating Data Source System of Record / Authoritative Source Lead and steward business domain Usage of data in an information exchange Example AND Business Rules for Data Validity

Metadata (3) Without Metadata, You may not be sure what you are looking at For example what does a length of 6.2 mean (a Mars Lander crashed because of this problem) You may not be sure what process you should use to execute a business process Etc

Key Question Where do you get the data for the data marts. Inmon: from the data warehouse Kimball: directly from the operational systems The problem is that source data may be available from several sources What do you do if the sources do not have identical data For derived data, does everyone use the exact same method (including data sources) Are the semantics the same

A man with one watch knows what time it is A man with two watches is never sure -Louis Carroll

Operational Systems These are the systems that run the business on a daily basis Most Customer interactions are with operational systems Operational Systems AR AP Order Entry Etcetera

Data Warehouse Serves as the single, officially accepted and approved, valid source for all data needed for business analysis /intelligence Helps insure that all Data Marts are reading from the same book and using the same rules. Much easier change management Fully normalized RDB with summary data added Data Warehouse ---------------- Archive

Data Mart Geared to meet the business users need Uses range from Simple reports to Data delivered in a manipulability form Word Excel Small Data Warehouse Star Schema Delivery media depends upon planned usage Reporting & BI Systems Reporting & BI Systems Data Mart for Accounting Data Mart for Sales Etcetera

Getting the Data In Vendors promise us that the ETL process for moving data from operational systems to the Data Warehouse is “simple” But we must deal with Conflict resolution Data Quality Issues Duplicate data etc

Getting the Data Out Generally use SQL queries Care must still be taken to keep dimensions consistent Marts sourced from a single Data Warehouse can be merged

Other Important Big Picture Data Issues

Net-Centricity Business Issue Business Users may be located all over the place and each may move around Home, Office, Customer site, etc There are many computers in the network and each one Can execute a specified set of processes Maintains a specified set of data bases Is not necessarily the same as any other computer. The business user must be able to access any desired data and execute any desired process by connecting into any point on the network

Key Net Centric Services Data services Process services Discovery services Relies very heavily on metadata Minimize SPOFs

Net-Centric Data Related Issues If a piece of the network (or a processor) goes down, the overall system must still be functional Data Discovery Services Stress on Data Quality Data bottlenecks must be mitigated Data replication must be planned to Guarantee latest version of data as soon as possible and reasonable Inform the user of “old” data being presented

Governance What are the key tasks Who is responsible for the tasks Management roles and responsibilities Worker bee roles and responsibilities What to do is things go wrong Some key tasks include Data Stewardship Data Quality Business Continuity Performance Monitoring and Tuning Testing

Part III Building a Dimensional Model

Why Build a Data Mart OLTP systems are designed for rapid response and high transaction rates Reporting from them is slow and degrades performance Data Warehouse is big (hard for user to find things) and not designed for slicing and dicing. It is very well suited for general reporting

Design Options for Data Marts Design chosen must consider the needs and abilities of the user Excerpt from big Data Warehouse using the same or similar design Dimensional Model (Star or Snow Flake) Spreadsheet Word Document etcetera

Basic Steps for Building A Dimensional Data Mart Determine the Business Needs Determine Sources of Data in Data Warehouse Identify the Granularity of the Data Determine the data in the Fact Table(s) Determine the Dimensions Build the Dimensional Model

Sales Data Warehouse

1. Determining Business Needs Must be done in face to face meetings Ask for samples or mock up reports Probe for possible extensions Determine which type of data mart is best suited to the set of users.

2. Determine Source of Data in DW Which table(s) in the data warehouse contain the data needed by the user for Fact Tables Dimension tables Is any of the data from external sources How do you plan to get that data How reliable is the external data Did you check it yourself Fortunately you have already solved your internal DQ problems

Dimensional Definitions A dimensional model requires Fact table Contains the measurements or metrics or facts of business processes PLUS foreign keys to the several dimension table Dimension Table(s) Context of the measurements (or facts) are represented in dimension tables. Typically the what, where, when, when, who and how of the measurement (or fact)

3. Identify Granularity Needed For the fact table What is the lowest level of granularity (detail) needed Is it atomic data or summarized data How many fact tables are needed (each will have its own star schema) If you wish to join data across star schemas, you must have consistent sourcing of data and consistent dimensions. and appropriately matching granularities.

4. Determine Data in Fact Table For each fact table, What single fact (or tightly coupled group of facts) will be included. Determine how the fact will be extracted from the source What dimensions are available in the source data (Foreign keys) Does your source database support the desired foreign keys.

5. Determine the Dimensions Dimensions generally follow the standard who, what, where, when, why, and how Who are there people or organizations associated with the fact What are the ways of categorizing the fact in the fact table Where: what are the locations associated with the fact May be physical locations or conceptual locations

5. Dimensions continued When: What are the time factors involved Day, week, month, quarter, day of week, etc Why is the fact included (not very common) How: Was any special process or method associated with the fact E.g. store sale, phone sale, internet sale

6. Build the Dimensional Model For the fact table, extract the data and all the associated dimensions Often requires data from several tables For each dimension, determine the set of possible values (domain) and populate each dimension table Note that all the needed data may not be in the source database (e.g. suppose there were no sales from Wyoming)

The Finale Test the dimensional database you built Deliver it to the user and train the user Questions Jerry Rosenbaum jrosenba@ix.netcom.com 443-253-6054

References Ralph Kimball and Margy Ross – The Data Warehouse Toolkit, John Wiley & Sons Len Silverston – The Data Model Resource Book, Vol 1 & 2 Graeme Simsion – Data Modeling Essentials, 3rd Edition Morgan Kaufman