Presentation is loading. Please wait.

Presentation is loading. Please wait.

ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1.

Similar presentations


Presentation on theme: "ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1."— Presentation transcript:

1 ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

2 Outline Data Warehousing Approaches Dimensional Modeling Data Warehousing with Microsoft SQL Server 2005 Case: Adventure Works Cycles (AWC) : Data Warehouse Design Phases 2

3 Data Warehousing Approaches 3

4 4 Data Warehouse Development Approaches Data warehouse development approaches ◦ Inmon Model: EDW approach ◦ Kimball Model: Data mart approach Which model is better? ◦ There is no one-size-fits-all strategy to data warehousing ◦ One alternative is the hosted warehouse

5 General Data Warehouse Development Approaches “Big bang” approach Incremental approach: ◦ Top-down incremental approach ◦ Bottom-up incremental approach ISQS 6339, Data Mgmt & BI, Zhangxi Lin 5

6 “Big Bang” Approach ISQS 6339, Data Mgmt & BI, Zhangxi Lin 6 Analyze enterprise requirements Build enterprise data warehouse Report in subsets or store in data marts

7 Incremental Approach to Warehouse Development Multiple iterations Shorter implementations Validation of each phase ISQS 6339, Data Mgmt & BI, Zhangxi Lin 7 Strategy Definition Analysis Design Build Production Increment 1 Iterative

8 Top-Down Approach ISQS 6339, Data Mgmt & BI, Zhangxi Lin 8 Analyze requirements at the enterprise level Develop conceptual information model Identify and prioritize subject areas Complete a model of selected subject area Map to available data Perform a source system analysis Implement base technical architecture Establish metadata, extraction, and load processes for the initial subject area Create and populate the initial subject area data mart within the overall warehouse framework

9 Bottom-Up Approach ISQS 6339, Data Mgmt & BI, Zhangxi Lin 9 Define the scope and coverage of the data warehouse and analyze the source systems within this scope Define the initial increment based on the political pressure, assumed business benefit and data volume Implement base technical architecture and establish metadata, extraction, and load processes as required by increment Create and populate the initial subject areas within the overall warehouse framework

10 Dimensional Modeling 10

11 11 Dimensional Model Also called star schema ◦ Fact table is in the middle and dimensions serving as the points on the star. ◦ A normalized fact table plus denormalized dimension tables Facts ◦ Measurements associated with a specific business process. ◦ Most facts are additive (calculative); others are semi-additive, non-additive, or descriptive (e.g. factless fact table). ◦ Many facts can be derived from other facts. So, non-additive facts can be avoided by calculating it from additive facts. Grain ◦ The level of detail contained in the fact table ◦ The lowest level of detail is called atomic fact table 11

12 12 Dimensions The foundation of the dimensional model to describe the objects of the business The nouns of the DW/BI system ◦ Business processes (facts) are the verbs of the business Dimension tables link to all the business processes. A dimension shared across all processes is called conformed dimension The analysis involving data from more than one business process is called drill-across. 12

13 13 Data Cube Data cubes are multidimensional extensions of 2-D tables, just as in geometry a cube is a three-dimensional extension of a square. The word cube brings to mind a 3-D object, and we can think of a 3-D data cube as being a set of similarly structured 2-D tables stacked on top of one another. Data cubes aren't restricted to just three dimensions. Most OLAP systems can build data cubes with many more dimensions allows up to 64 dimensions. In practice, we often construct data cubes with many dimensions, but we tend to look at just three at a time. What makes data cubes so valuable is that we can index the cube on one or more of its dimensions. 13

14 14 Determining Granularity YEAR? QUARTER? MONTH? WEEK? DAY?

15 15 Star Schema Model 15 Product Table Product_id Product_disc,... Time Table Day_id Month_id Year_id,... Sales Fact Table Product_id Store_id Item_id Day_id Sales_amount Sales_units,... Item Table Item_id Item_desc,... Store Table Store_id District_id,... Central fact table Denormalized dimensions

16 16 Snowflake Schema Model 16 Time Table Week_id Period_id Year_id Dept Table Dept_id Dept_desc Mgr_id Mgr Table Dept_id Mgr_id Mgr_name Product Table Product_id Product_desc Item Table Item_id Item_desc Dept_id Sales Fact Table Item_id Store_id Product_id Week_id Sales_amount Sales_units Store Table Store_id Store_desc District_id District Table District_id District_desc

17 17 Snowflake Schema Model ◦ Direct use by some tools ◦ More flexible to change ◦ Provides for speedier data loading ◦ Can become large and unmanageable ◦ Degrades query performance ◦ More complex metadata 17 CountryStateCountyCity

18 Dimensional Modeling Process High level dimensional model design ◦ Choosing business model ◦ Declaring the grain ◦ Choosing dimensions ◦ Identifying the facts Detailed dimensional model development Dimensional model review and validation ◦ IS ◦ Core users ◦ Business community Final design iteration ISQS 6339, Data Mgmt & BI, Zhangxi Lin 18

19 19 Example: Commrex Real Estate Data Warehousing Analytic themes ◦ How to encourage realtors to use the online ASP services Value Chain ◦ Listors create their account ◦ Listors post their real estate properties to the web-based database services and pay listing fees ◦ Property buyers search the website-based database and buy properties from listors. This is the incentive for listors to use the ASP services Business Processes ◦ Listor sign up ◦ Listor account management ◦ Property data posting ◦ Property search ◦ Property database maintenance 19

20 ISQS 6339, Data Mgmt & BI, Zhangxi Lin 20 Property ID Listor ID Address Property Type City                     Company ID Chapter Functions Specializations                    Comp Name Address      Telephone # Listor Name Chapter Feature Property Type Subtype 1      Type Name      Subtype 2 Subtype n M:1 M:M Primary Key Secondary Key Link to a table Legends Property Listing Database Membership Database IMW’s Database ERD Model Company ID

21 ISQS 6339, Data Mgmt & BI, Zhangxi Lin 21 Property ID Listor ID Address Prop SubType City                     Company ID Chapter Functions Specializations                    Company ID Address      Telephone # Listor Name Chapter Feature Prop SubType Property Type SubType Name Primary Key Secondary Key Link to a table Legends Property Listing Fact Membership Dimension IMW’s Data Warehouse Dimensional Model Company Dimension Property SubType Dimension Comp Name Property Type Type Name Property Type Dimension

22 Data Warehousing with Microsoft SQL Server 2005 ISQS 6339, Data Mgmt & BI, Zhangxi Lin 22

23 Unified Dimensional Model (UDM) A SQL Server 2005 technology A UDM is a structure that sits over the top of a data mart and looks exactly like an OLAP system to the end user. Advantages ◦ No need for a data mart. ◦ Can be built over one or more OLTP systems. ◦ Mixed data mart and OLTP system data ◦ Can include data from database from other vendors and XML- formatted data ◦ Allows OLAP cubes to be built directly on top of transactional data ◦ Low latency ◦ Ease of creation and maintenance ISQS 6339, Data Mgmt & BI, Zhangxi Lin 23

24 Microsoft BI Toolset Relational engine (RDBMS) ◦ T-SQL ◦.NET Framework Command Language Runtime (CLR) SQL Server Integration Services (SSIS) – ETL ◦ Data Transformation Pipeline (DTP) ◦ Data Transformation Runtime (DTR) SQL Server Analysis Service (SSAS) – queries, ad hoc use, OLAP, data mining ◦ Multi-Dimensional eXpressions (MDX) – a scripting language for data retrieval from dimensional database ◦ Dimension design ◦ Cube design ◦ Data mining SQL Server Reporting Services (SSRS) – ad hoc query, report building Microsoft Visual Studio.NET is the fundamental tool for application development 24

25 Structure and Components of Business Intelligence 25 SSMS SSIS SSAS SSRS SAS EM SAS EM SAS EG SAS EG MS SQL Server 2005 BIDS

26 Understanding the Cube Designer Tabs Cube Structure: Use this tab to modify the architecture of a cube. Dimension Usage: Use this tab to define the relationships between dimensions and measure groups, and the granularity of each dimension within each measure group. Calculations: Use this tab to examine calculations that are defined for the cube, to define new calculations for the whole cube or for a subcube, to reorder existing calculations, and to debug calculations step by step by using breakpoints. KPIs: Use this tab to create, edit, and modify the Key Performance Indicators (KPIs) in a cube. Actions: Use this tab to create or modify drillthrough, reporting, and other actions for the selected cube.. Partitions: Use this tab to create and manage the partitions for a cube. Partitions let you store sections of a cube in different locations with different properties, such as aggregation definitions. Perspectives: Use this tab to create and manage the perspectives in a cube. A perspective is a defined subset of a cube, and is used to reduce the perceived complexity of a cube to the business user. Translations: Use this tab to create and manage translated names for cube objects, such as month or product names. Browser: Use this tab to view data in the cube. ISQS 6339, Data Mgmt & BI, Zhangxi Lin 26

27 Case: Adventure Works Cycles (AWC) 27

28 Case: Adventure Works Cycles (AWC) A fictitious multinational manufacturer and seller of bicycles and accessories Based on Bothell, Washington, USA and has regional sales offices in several countries http://www.msftdwtoolkit.com/ ISQS 6339, Data Mgmt & BI, Zhangxi Lin 28

29 Basic Business Information Product orders by category Product Orders by Country/Region Product Orders by Sales Channel Customers by Sales Channel Snapshot ISQS 6339, Data Mgmt & BI, Zhangxi Lin 29

30 AWC Business Requirements - Interview summary Interviewee: Brian Welker, VP of Sales Sales to resellers: $37 million last year 17 people report to him including 3 regional sales managers Previous problem: Hard to get information out of the company’s system Major analytic areas: Sales planning Growth analysis Customer analysis Territory analysis Sales performance Basic sales reporting Price lists Special offers Customer satisfaction International support Success criteria Easy data access, Flexible reporting and analyzing, All data in one place What’s missing? – A lot – No indication of business value ISQS 6339, Data Mgmt & BI, Zhangxi Lin 30

31 Business Processes Purchase Orders Distribution Center Deliveries Distribution Center Inventory Store Deliveries Store Inventory Store Sales ISQS 6339, Data Mgmt & BI, Zhangxi Lin 31

32 Analytic Themes See the Excel file AW_Analytic_Themes_List.xls AW_Analytic_Themes_List.xls ISQS 6339, Data Mgmt & BI, Zhangxi Lin 32

33 AWC’s Bus Matrix Dimensions Business Process Date Product Employee Customer (Reseller) Customer (Internet) Sales Territory Currency Channel Promotion Call Reason Facility Sales ForecastingXXXXXXX OrdersXXXXXXXXX Call TrackingXXXXXX X ReturnsXX XXXXX X ISQS 6339, Data Mgmt & BI, Zhangxi Lin 33

34 Prioritization Grid ISQS 6339, Data Mgmt & BI, Zhangxi Lin 34 Orders Forecast Call Tracking Exchange Rates Returns Manufacturing Costs Customer Profitability Product Profitability Feasibility HighLow High Low Business Value / Impact

35 Exercise 2 – A quick walk through an SSAS application Learning Objectives ◦ How to design a data source view with SSAS based on an existing data warehouse ◦ How to design and deploy a cube. Tasks ◦ Analysis Service Tutorial Lesson 1: Defining a Data Source View within an Analysis Services Project ◦ Analysis Service Tutorial Lesson 2: Defining and Deploying a Cube Deliverable: ◦ A Word file with the screenshot of the star schema emailed to zhangxi.lin@hotmail.com zhangxi.lin@hotmail.com ◦ The subject of the email is: “ISQS 3358 Exercise 2” ISQS 6339, Data Mgmt & BI, Zhangxi Lin 35

36 Supplemental Slides : Data Warehouse Design Phases 36

37 37 Data Warehouse Database Design Phases Phase 1: Defining the business model Phase 2: Defining the dimensional model Phase 3: Defining the physical model 37

38 38 Phase 1: Defining the Business Model ◦ Performing strategic analysis ◦ Creating the business model ◦ Documenting metadata 38

39 39 Performing Strategic Analysis Identify crucial business processes Understand business processes Prioritize and select the business processes to implement 39 Business Benefit LowHigh Low High Feasibility

40 40 Creating the Business Model  Defining business requirements: ◦ Identifying the business measures ◦ Identifying the dimensions ◦ Identifying the grain ◦ Identifying the business definitions and rules  Verifying data sources 40

41 41 Business Requirements Drive the Design Process ◦ Primary input ◦ Secondary input Existing MetadataProduction ERD Model Business Requirements Research 41

42 42 Identifying Measures and Dimensions The attribute varies continuously: ◦ Balance ◦ Units Sold ◦ Cost ◦ Sales The attribute is perceived as constant or discrete: ◦ Product ◦ Location ◦ Time ◦ Size 42 Measures Dimensions

43 43 Using a Business Process Matrix 43 Sample of business process matrix Business Dimension s Business Processes SalesReturns Inventor y Customer Date Product Channel Promotion

44 44 Determining Granularity 44 YEAR? QUARTER? MONTH? WEEK? DAY?

45 45 Identifying Business Rules 45 Store Store > District > Region Location Geographic proximity 0 - 1 miles 1 - 5 miles > 5 miles Product Type Monitor Status PC15 inchNew Server17 inchRebuilt 19 inchCustom None Time Month > Quarter > Year

46 46 Documenting Metadata Documenting metadata should include: ◦ Documenting the design process ◦ Documenting the development process ◦ Providing a record of changes ◦ Recording enhancements over time 46

47 47 Metadata Documentation Approaches ◦ Automated  Data modeling tools  ETL tools  End-user tools ◦ Manual 47

48 48 Phase 2: Defining the Dimensional Model ◦ Identify fact tables:  Translate business measures into fact tables  Analyze source system information for additional measures ◦ Identify dimension tables ◦ Link fact tables to the dimension tables ◦ Model the time dimension 48

49 49 Star Dimensional Modeling 49 Store Table Store_id District_id... Item Table Item_id Item_desc... Sales Fact Table Product_id Store_id Item_id Day_id Sales_amount Sales_units... Product Table Product_id Product_desc... Time Table Day_id Month_id Period_id Year_id

50 50 Fact Table Characteristics ◦ Contain numerical metrics of the business ◦ Can hold large volumes of data ◦ Can grow quickly ◦ Can contain base, derived, and summarized data ◦ Are typically additive ◦ Are joined to dimension tables through foreign keys that reference primary keys in the dimension tables 50 Sales Fact Table Product_id Store_id Item_id Day_id Sales_amount Sales_units...

51 51 Dimension Table Characteristics Dimension tables have the following characteristics: ◦ Contain textual information that represents the attributes of the business ◦ Contain relatively static data ◦ Are joined to a fact table through a foreign key reference 51

52 52 Star Dimensional Model Characteristics ◦ The model is easy for users to understand. ◦ Primary keys represent a dimension. ◦ Nonforeign key columns are values. ◦ Facts are usually highly normalized. ◦ Dimensions are completely denormalized. ◦ Fast response to queries is provided. ◦ Performance is improved by reducing table joins. ◦ End users can express complex queries. ◦ Support is provided by many front-end tools. 52

53 53 Using Time in the Data Warehouse ◦ Defining standards for time is critical. ◦ Aggregation based on time is complex. 53

54 54 The Time Dimension Time is critical to the data warehouse. A consistent representation of time is required for extensibility. 54 Where should the element of time be stored? Time dimension Sales fact

55 55 Using Data Modeling Tools ◦ Tools with a GUI enable definition, modeling, and reporting. ◦ Avoid a mix of modeling techniques caused by:  Development pressures  Developers with lack of knowledge  No strategy ◦ Determine a strategy. ◦ Write and publish formally. ◦ Make available electronically. 55

56 56 Phase 3: Defining the Physical Model Why ◦ Huge amount of data must be effectively processed and retrieved in realtime. How ◦ Translate the dimensional design to a physical model for implementation. ◦ Define storage strategy for tables and indexes. ◦ Perform database sizing. ◦ Define initial indexing strategy. ◦ Define partitioning strategy. ◦ Update metadata document with physical information. 56

57 57 Storage and Performance Considerations Database sizing Data partitioning Indexing Star query optimization 57

58 58 Database Sizing - Test Load Sampling Analyze a representative sample of the data chosen using proven statistical methods. Ensure that the sample reflects: ◦ Test loads for different periods ◦ Day-to-day operations ◦ Seasonal data and worst-case scenarios ◦ Indexes and summaries 58

59 59 Data Partitioning Breaking up of data into separate physical units that can be handled independently Types of data partitioning ◦ Horizontal partitioning. ◦ Vertical partitioning 59

60 60 Indexing Indexing is used for the following reasons: ◦ It is a huge cost saving, greatly improving performance and scalability. ◦ It can replace a full table scan by a quick read of the index followed by a read of only those disk blocks that contain the rows needed. 60

61 61 Parallelism Parallel Execution Servers Sales table Customers table P3P3 P3P3 P1P1 P1P1 P2P2 P2P2

62 62 Using Summary Data Designing summary tables offers the following benefits: ◦ Provides fast access to precomputed data ◦ Reduces use of I/O, CPU, and memory 62


Download ppt "ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1."

Similar presentations


Ads by Google