ISQS 3358, Business Intelligence Creating Data Marts Zhangxi Lin Texas Tech University 1.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Microsoft Dynamics AX 2009 Integration and Development with.NET Framework Business Intelligence: OLAP and Analytics.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Business Intelligence in Microsoft SQL Server 2005 Marin Bezić Microsoft EMEA SQL BI PRODUCT MANAGER
ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
BI All the way Part II - Analysis Services Gal Gubesi CEO, Microsoft Regional Director for BI
Organizing Data & Information
SQL 2005 BI and Reporting Services for the developer
Chapter 14 The Second Component: The Database.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
Introduction to Building a BI Solution 권오주 OLAPForum
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
ISQS 3358, Business Intelligence Creating Data Marts Zhangxi Lin Texas Tech University 1.
What is Business Intelligence? Business intelligence (BI) –Range of applications, practices, and technologies for the extraction, translation, integration,
SQL Analysis Services Microsoft® SQL Server 2005 Analysis Services provides unified, fully integrated views of your business data to support online.
 First two parts of class ◦ Part 1: What is business intelligence and why should organizations consider incorporating more technology-related intelligence.
ISQS 6339, Business Intelligence Creating Data Marts
CSI315CSI315 Web Development Technologies Continued.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
IMS 6217: Data Warehousing / Business Intelligence Part 3 1 Dr. Lawrence West, Management Dept., University of Central Florida Analysis.
The McGraw-Hill Companies, Inc Information Technology & Management Thompson Cats-Baril Chapter 3 Content Management.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Dimensional model. What do we know so far about … FACTS? “What is the process measuring?” Fact types:  Numeric Additive Semi-additive Non-additive (avg,
OnLine Analytical Processing (OLAP)
Using SAS® Information Map Studio
ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
ISQS 6339, Data Management and Business Intelligence Cubism – Bells and Whistles Zhangxi Lin Texas Tech University 1.
Building the cube – Chapter 9 & 10 Let’s be over with it.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
ISQS 3358, Business Intelligence Supplemental Notes on the Term Project Zhangxi Lin Texas Tech University 1.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Building Dashboards SharePoint and Business Intelligence.
1 Agenda – 04/02/2013 Discuss class schedule and deliverables. Discuss project. Design due on 04/18. Discuss data mart design. Use class exercise to design.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Fundamentals of Information Systems, Sixth Edition Chapter 3 Database Systems, Data Centers, and Business Intelligence.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Event Title Event Date. Module 02—Introduction to Dimensional Modeling Techniques Name Title Microsoft Corporation.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
Foundations of information systems : BIS 1202 Lecture 4: Database Systems and Business Intelligence.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Practical MSBI(SSIS, SSAS,SSRS) online training. Contact Us: Call: Visit:
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Extending and Creating Dynamics AX OLAP Cubes
Zhangxi Lin Texas Tech University
Fundamentals & Ethics of Information Systems IS 201
Chapter 13 The Data Warehouse
Zhangxi Lin Texas Tech University
Fundamentals of Information Systems
Introduction to Essbase
C.U.SHAH COLLEGE OF ENG. & TECH.
Retail Sales is used to illustrate a first dimensional model
Applying Data Warehouse Techniques
Analysis Services Analysis Services vs. the Data Warehouse vs. OLTP DB
Applying Data Warehouse Techniques
Presentation transcript:

ISQS 3358, Business Intelligence Creating Data Marts Zhangxi Lin Texas Tech University 1

Outline Illustrative Example : Adventure Works Cycles (AWC) Illustrative Example : Adventure Works Cycles (AWC) Data Warehousing with Microsoft SQL Server 2008 Exercise 2 Types of Dimension Tables Hands-on Case: Maximum Miniature Manufacturing Exercise 3

ILLUSTRATIVE EXAMPLE : ADVENTURE WORKS CYCLES (AWC) 3

Adventure Works Cycles (AWC) A fictitious multinational manufacturer and seller of bicycles and accessories Based on Bothell, Washington, USA and has regional sales offices in several countries ISQS 6339, Data Mgmt & BI, Zhangxi Lin4

Basic Business Information Product orders by category Product Orders by Country/Region Product Orders by Sales Channel Customers by Sales Channel Snapshot ISQS 6339, Data Mgmt & BI, Zhangxi Lin5

Business Processes Purchase Orders Distribution Center Deliveries Distribution Center Inventory Store Deliveries Store Inventory Store Sales ISQS 6339, Data Mgmt & BI, Zhangxi Lin6

Analytic Themes See the Excel file \\TechShare\coba\d\isqs3358\Repository\AWC\ AW_Analytic_Themes_List.xls\Repository\AWC\ AW_Analytic_Themes_List.xls SQL Server 2008 R2 – Data Warehousing Scaling and Performance 41’28” SQL Server 2008 R2 – Data Warehousing Scaling and Performance ISQS 6339, Data Mgmt & BI, Zhangxi Lin7

AWC’s Bus Matrix Dimensions Business Process Date Product Employee Customer (Reseller) Customer (Internet) Sales Territory Currency Channel Promotion Call Reason Facility Sales ForecastingXXXXXXX OrdersXXXXXXXXX Call TrackingXXXXXX X ReturnsXX XXXXX X ISQS 6339, Data Mgmt & BI, Zhangxi Lin8

9 Prioritization Grid Orders Forecast Call Tracking Exchange Rates Returns Manufacturing Costs Customer Profitability Product Profitability Feasibility HighLow High Low Business Value / Impact

Unified Dimensional Model (UDM) A SQL Server 2008 technology A UDM is a structure that sits over the top of a data mart and looks exactly like an OLAP system to the end user. Advantages ◦ No need for a data mart. ◦ Can be built over one or more OLTP systems. ◦ Mixed data mart and OLTP system data ◦ Can include data from database from other vendors and XML- formatted data ◦ Allows OLAP cubes to be built directly on top of transactional data ◦ Low latency ◦ Ease of creation and maintenance Features ◦ Data sources ◦ Data views ◦ Proactive caching for preprocessed aggregates ISQS 6339, Data Mgmt & BI, Zhangxi Lin10

DATA WAREHOUSING WITH MICROSOFT SQL SERVER 2008 ISQS 6339, Data Mgmt & BI, Zhangxi Lin11

Microsoft BI Toolset Relational engine (RDBMS) ◦ T-SQL ◦.NET Framework Command Language Runtime (CLR) SQL Server Integration Services (SSIS) – ETL ◦ Data Transformation Pipeline (DTP) ◦ Data Transformation Runtime (DTR) SQL Server Analysis Service (SSAS) – queries, ad hoc use, OLAP, data mining ◦ Multi-Dimensional eXpressions (MDX) – a scripting language for data retrieval from dimensional database ◦ Dimension design ◦ Cube design ◦ Data mining SQL Server Reporting Services (SSRS) – ad hoc query, report building Microsoft Visual Studio.NET is the fundamental tool for application development Design Facts, Dimensions and Transformation/Load Processes 3’46” Design Facts, Dimensions and Transformation/Load Processes 12

13 Structure and Components of Business Intelligence SSMS SSIS SSAS SSRS SAS EM SAS EM SAS EG SAS EG MS SQL Server 2008 BIDS

Disadvantages of OLAP Complexity to administer Requires data mart Latency Read-only

Understanding the Cube Designer Tabs Cube Structure: Use this tab to modify the architecture of a cube. Dimension Usage: Use this tab to define the relationships between dimensions and measure groups, and the granularity of each dimension within each measure group. Calculations: Use this tab to examine calculations that are defined for the cube, to define new calculations for the whole cube or for a subcube, to reorder existing calculations, and to debug calculations step by step by using breakpoints. KPIs: Use this tab to create, edit, and modify the Key Performance Indicators (KPIs) in a cube. Actions: Use this tab to create or modify drillthrough, reporting, and other actions for the selected cube.. Partitions: Use this tab to create and manage the partitions for a cube. Partitions let you store sections of a cube in different locations with different properties, such as aggregation definitions. Perspectives: Use this tab to create and manage the perspectives in a cube. A perspective is a defined subset of a cube, and is used to reduce the perceived complexity of a cube to the business user. Translations: Use this tab to create and manage translated names for cube objects, such as month or product names. Browser: Use this tab to view data in the cube. ISQS 6339, Data Mgmt & BI, Zhangxi Lin15

EXERCISE 2

AWC Business Requirements - Interview summary Interviewee: Brian Welker, VP of Sales Sales to resellers: $37 million last year 17 people report to him including 3 regional sales managers Previous problem: Hard to get information out of the company’s system Major analytic areas: Sales planning Growth analysis Customer analysis Territory analysis Sales performance Basic sales reporting Price lists Special offers Customer satisfaction International support Success criteria Easy data access, Flexible reporting and analyzing, All data in one place What’s missing? – A lot – No indication of business value ISQS 6339, Data Mgmt & BI, Zhangxi Lin17

Business Analytics Theme Sales performance ◦ By territory ◦ By product ◦ By customer ◦ By date

AWC Sale Database

AWC Sale Data Warehouse

Tables in AWC Sales DW

Types of Dimension Table Conformed dimensions ◦ Time, SaleTerritory Junk dimensions Role playing dimensions ◦ Time Slowly changing dimensions (SCD) Aggregate dimensions Aggregate dimensions Degenerate dimensions Degenerate dimensions Many-to-many or multivalued dimensions Many-to-many or multivalued dimensions

Select Measure Group Tables

Why DimEmployee disappeared?

Deliverable: AWC Sales Data Warehouse Screenshot Due: Feb 15, 2016, by to

TYPES OF DIMENSION

Types of dimension Types of dimension There are 7 types of frequently referred dimensions ◦ Conformed dimensions ◦ Junk dimensions ◦ Role playing dimensions ◦ Slowly changing dimensions (SCD) ◦ Aggregate dimensions ◦ Degenerate dimensions ◦ Many-to-many or multivalued dimensions  For more information about types of dimension, check  For more information about types of dimension, check The Microsoft Data Warehouse Toolkit, Joy Mundy and Warren Thornthwaite, Wiley, 2006

Conformed Dimensions A set of data attributes that have been physically implemented in multiple database tables using the same structure, attributes, domain values, definitions and concepts in each implementation. Dimension tables are not conformed if the attributes are labeled differently or contain different values. Conformed dimensions come in several different flavors. At the most basic level, conformed dimensions mean exactly the same thing with every possible fact table to which they are joined. E.g. The date dimension table connected to the sales facts and the one connected to the inventory facts.

Conformed Dimensions Dimensions are conformed when they are either exactly the same (including keys) or one is a perfect subset of the other. Most important, the row headers produced in the answer sets from two different conformed dimensions must be able to match perfectly. Conformed dimensions are either identical or strict mathematical subsets of the most granular, detailed dimension.

Junk Dimensions Also called miscellaneous or mystery dimensions They are miscellaneous attributes that don’t belong to any existing dimension. Typically flags or indictors that describe or categorize the transaction in some way. Contents are often important Four alternatives for dealing with them ◦ Leave them in the fact table ◦ Create a separate dimension for each attribute ◦ Omit them ◦ Group them into a single junk dimension 33

Degenerate Dimensions A degenerate dimension is a dimension key in the fact table that does not have its own dimension table, because all the interesting attributes have been placed in analytic dimensions. Features ◦ No description of its own ◦ No joining to an actual dimension table ◦ No attributes Example: transaction ID 34

Junk Dimension Example

Role-playing dimensions A table with multiple valid relationships between itself and a fact table is known as a role-playing dimension. For instance, a “Time" dimension can be used for “Order Day", as well as “Ship Date", or “Close Day".

Slowly Changing Dimensions The dimensions that have changeable attribute values are slowly changing dimensions (SCDs) The attribute values of SCD may change over time, which are critical to understand the dynamics of the business. The ability to track the changes of facts over time is critical to a DW/BI system. Examples ◦ Employees changed their departments ◦ Home moving (16.8% American moved per year) – zip code changes possible. More information 37

Three Types of SCD Type 1 SCD overwrites the existing attribute value with a new value. You don’t care about keeping track of historical values Type 2 SCD change tracking – ETL process creates a new row in the dimension table to capture the new values of the changed item Type 3 SCD – Similar to Type 2 SCD but only track current state and the original state; two additional attribute: SCD Start Date, SCD Initial Value

Aggregate Dimensions Situation: data at different levels of granularities Two resolutions ◦ Removing a dimension ◦ Rolling up a dimension’s hierarchy and provide a new, shrunken dimension at the aggregate level In the following case, the number of possible aggregates is the number of levels in each hierarchy of each dimension multiplied together. 39

Many-to-many or Multivalued Dimensions Relationship between a dimension table and fact table is called one-to-many: one row in the dimension table may join to many rows in the fact table. Many-to-many or Multivalued Dimensions are referred to as there are more than one row in a dimension table joining to multiple rows in a fact table Bridge table supports many-to-many relationship: ◦ fact-dimension ◦ dimension-dimension. 40

Many-to-many or Multivalued Dimensions A dimensional model for a sales fact that captures multiple sales reasons

HANDS-ON CASE: MAXIMUM MINIATURE MANUFACTURING

Maximum Miniatures Manufacturing – Designing Data Mart General business needs ◦ To analyze the statistics available from the manufacturing automation systems. The VP would like an interactive analysis tool, rather than printed reports, for the analysis. The manufacturing automation system controls all the machines to create figurines ◦ Filling a mold with the raw material ◦ Aiding the hardening of this materials ◦ Removal from the mod when hardening is complete ◦ Computerized painting of the figurines ◦ Curing the paint if necessary 43

Maximum Miniatures Manufacturing – Creating Data Mart Specific Business Needs ◦ Analyzing the following numbers  Dollar value of products sold  Number of products sold  Sale tax charged on products sold  Shipping charged on products sold ◦ These numbers should be viewable by:  Store  Sales Promotion  Product  Day, Month, Quarter, and Year  Customer  Sales Person 44

Data Requirements Number of accepted products by batch by product by machines by day Number of rejected products by batch by product by machines by day Elapsed time for molding and hardening by product by machine by day Elapsed time for painting and curing by curing type by product by machine by day Product rolls up into product subtype, which rolls up into product type Machine rolls up into machine type, which rolls up into country Day rolls up into month, which rolls up into quarter, which rolls up into year The information should be able to be filtered by machine manufacturer and purchase date of the machine 45

Business Need of Sales The VP of sales for Max Min, Inc. would like to analyze sales information. This information is collected by three OLTP systems: the Order Processing System, the Point of Sale (POS) system, and the MaxMin.com Online system. To analyze the following numbers ◦ Dollar value of products sold ◦ Number of products sold ◦ Sales tax charged on product sold ◦ Shipping charged on product sold These number should be viewable by: store, sales promotion, product, time, customer, sales person 46

47 Snowflake Schema of the Data Mart Manufacturingfact DimProduct DimProductSubType DimProductType DimBatch DimMachine DimMachineType DimMaterial DimPlant DimCountry

EXERCISE 3

Exercise 3 – Creating a data mart with SSMS Learning Objectives ◦ How to design a dimensional model ◦ How to create a data mart with SSMS ◦ How to create a cube for a data mart. Tasks ◦ In SSMS, manually create the fact table and DimProduct table using SSMS (see the detailed information from file DW_MMM.PDF in the shared directory under ~\references) ◦ Import remaining tables from lin.mmm.empty to your own database ◦ Define the primary keys of tables and the relationships among them ◦ Create a ERD diagram with SSMS The primary key of the fact table is composed of three foreign keys plus one time dimension key: ProductCode, BatchNumber, MachineNumber, and DateOfManufacture. Deliverable: ◦ The printout of the screenshot of the ER diagram 49

Hints for Deploying the OLAP Cube Due to the security restrictions, you need to: ◦ Double click the entry in Data Source ◦ Type in your eRaider login information in the Impersonation Information panel ◦ Change the server to OREDB 50

The screenshot of impersonation information

The properties of the project After this step you can proceed to deploy the cube

Surrogate Key A natural key is a value that has meaning to the user, but ought to be unique for every row. A good example of a natural key would be a license plate number for a car. A surrogate key is an artificial value that has no meaning to the user, but is guaranteed to be unique by the database itself. Surrogate keys are created when doing data warehousing. They are new from the keys in original database They are also called meaningless keys, substitute keys, non-natural keys, artificial keys Specifically, surrogate keys are used in slowly changed dimensions (SCD) management 53

Benefits of surrogate keys Protect the DW/BI system from changes in the source system Allow the DW/BI system to integrate data from multiple source system Enable developers to add rows to dimensions that do not exist in the source system Provide the means for tacking changes in dimension Are efficient in the relational database and analysis services

Heterogeneous Products Several products with differentiated attributes Problem: sharing one dimension or use different dimension? Resolutions ◦ One family-oriented dimension with core fact and product tables plus specific information for each line of product 55