Presentation is loading. Please wait.

Presentation is loading. Please wait.

ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Similar presentations


Presentation on theme: "ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1."— Presentation transcript:

1 ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1

2 Outline Where we’ve been Populating fact table Creating a cube with SSIS Measures Types of dimensions Cube design tabs 2

3 Structure and Components of Business Intelligence 3 SSMS SSIS SSAS SSRS SAS EM SAS EM SAS EG SAS EG

4 Snowflake Schema of the Data Mart 4 Manufacturingfact DimProduct DimProductSubType DimProductType DimBatch DimMachine DimMachineType DimMaterial DimPlant DimCountry 1 2 3 4 5 8 6 7 910

5 Where we’ve been and where we are now Exercise 1: Getting started Exercise 2: Creating data marts Exercise 3: Creating a cube from a data mart Exercise 4: Populating dimensions of a data mart Exercise 5: Exploring features of ETL data conversion tasks Exercise 6: Loading fact tables 5

6 What we need to do with the half-done data mart? Populate DimBatch dimenstion table Populate ManufacturingFact table Build an OLAP cube (we already did this before) Check measures Check dimensions 6

7 LOADING FACT TABLES 7

8 Exercise 6: Loading Fact Tables Project name: MMMFactLoad-lastname Package name: FactLoad.dtsx Tasks ◦ Create Inventory Fact table ◦ Load Dim Batch ◦ Load Manufacturing Fact ◦ Load Inventory Fact Deliverable: email a screenshot of the “green” outcome of the ETL project to zhangxi.lin@hotmail.com zhangxi.lin@hotmail.com 8

9 Inventory Fact Table Create a Table InventoryFact in database MaxMinManufacturingDM-lastname. ◦ Compound primary key: DateOfInventory, ProductCode, and Material ◦ Define two foreign keys Column NameData TypeAllow Nulls InventoryLevelIntNo NumberOnBackorderIntNo DateOfInventoryDatatimeNo ProductCodeIntNo MaterialVarchar(30)No 9

10 Data Sources for Loading Fact For loading DimBatch table and ManufacturingFact table ◦ BatchInfo.CSV For loading InventortyFact table ◦ OREDB.OrderProcessingSystem.Inventory 10

11 Control Flow for Loading Facts and the Remaining Dimension Note: to ease debugging, you may use three packages and test them one by one, instead of doing everything in one package 11

12 Flat File Connection Data types ◦ BatchNumber, MachinNumber: four-byte signed integer [DT_I4] ◦ ProductCode, NumberProduced, NumberRejected: four-byte signed integer [DT_I4] ◦ TimeStarted, TimeStopped: database timestamp [DT_DBTimeStamp] Only check BatchNumber as the input of Dim Batch All columns are needed for fact tables 12

13 Load DimBatch Data Flow 13

14 Load DimBatch Data Flow 14 Note: Because of duplication in the source file, we may insert An Aggregate item after the Flat File Source item.

15 The Flat File Source 15

16 16 Sort Transformation In the Aggregate item, Define “Group-by” BatchNumber. In Derived column item, Define BatchName From BatchNumber

17 Load Fact Data Flow 17

18 Derived Columns for the Fact table 18

19 Expressions for the Derived Columns AcceptedProducts ◦ [NumberProduced] – [NumberRejected] ElapsedTimeForManufacture ◦ DATEDIFF(“mi”, [TimeStarted],[TimeStopped]) DateOfManufacture ◦ (DT_DBTIMESTAMP)SUBSTRING((DT_WSTR,25)[T imeStarted],1,10)  This expression converts TimeStarted into a string and selects the first ten characters of that string. This string is then converted back into a date time, without the time portion. 19

20 20 OLE DB Destination For loading the fact table

21 Load Inventory Fact OLE DB Source ◦ OrderProcessingSystem.InventoryFact OLE DB Destination ◦ MaxMinManufacturingDM-lastname.InventoryFact No transformation There are two ways to loading the table ◦ Create the table and use ETL to load it ◦ Import directly from the source to the database MaxMinManufacturingDM-lastname 21

22 Debugging Results 22 Loading DimBatch Loading ManufacturingFact

23 BUILDING AN OLAP CUBE 23

24 Three Steps to Create a Cube from Data Sources Defining data source Defining data source view ◦ Add in three new columns of year, quarter, and month for the two fact tables Building a cube. ◦ Define a new dimension Dim Time from Manufacturing Fact table Customize the cube: ◦ Link two fact tables in a cube ◦ Define new primary key for Dim Time ◦ Define calculated measures ◦ Relate dimensions to measures 24

25 T-SQL Expressions for DS View Definition - Manufacture YearOfManufacture CONVERT(char(4),YEAR(DateOfManufacture)) QuarterOfManufacture CONVERT(char(4), YEAR(DateOfManufacture)) + CASE WHEN MONTH (DateOfManufacture) BETWEEN 1 AND 3 THEN 'Q1' WHEN MONTH (DateOfManufacture) BETWEEN 4 AND 6 THEN 'Q2' WHEN MONTH (DateOfManufacture) BETWEEN 7 AND 9 THEN 'Q3' ELSE 'Q4' END MonthOfManufacture CONVERT(char(4), YEAR(DateOfManufacture)) + RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfManufacture)),2) 25

26 T-SQL Expressions for DS View Definition - Inventory YearOfInventory CONVERT(char(4),YEAR(DateOfInventory)) QuarterOfInventory CONVERT(char(4), YEAR(DateOfInventory)) + CASE WHEN MONTH (DateOfInventory) BETWEEN 1 AND 3 THEN 'Q1' WHEN MONTH (DateOfInventory) BETWEEN 4 AND 6 THEN 'Q2' WHEN MONTH (DateOfInventory) BETWEEN 7 AND 9 THEN 'Q3' ELSE 'Q4' END MonthOfInventory CONVERT(char(4), YEAR(DateOfInventory)) + RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfInventory)),2) 26

27 Data Source View 27 New columns

28 Select Measures Page 28 Uncheck Manufacture Fact Count

29 Review New Dimensions Page 29 Rename Manufacturing Fact to Dim Time

30 30 The finished cube New dimension Created from the Fact table

31 31 Cube Structure

32 MEASURES 32

33 Facts Measurements associated with a specific business process. Types of measures ◦ Most facts are additive (calculative), such as sum; others are semi-additive (those that can be added along some dimensions, not along others), non-additive (such as max, average), or descriptive (e.g. factless fact table). Many facts can be derived from other facts. So, non- additive facts can be avoided by calculating it from additive facts. 33

34 Calculated measures The definition of calculated measure is stored in the OLAP cube itself. The actual values that result from a calculated measure are not calculated, however, until a query containing that calculated measure is executed. The results of that calculation are then cached in the cube. The cached value is then delivered to any subsequent users requesting the same calculation. The expressions of calculation are created using a language known as Multidimensional Expression Language (MDX) script. MDX is different from T-SQL. It is a special language with features designed to handle the advanced mathematics and formulas required by OLAP analysis. This is not found in T-SQL. 34

35 35 Define Format String “#, #” for measures: AcceptedProduct, RejectedProject

36 36 Defining a format string

37 37

38 38

39 Define Calculated Measures 39

40 DIMENSIONS 40

41 Managing Dimensions 41

42 Managing Dimensions 42

43 Relating Dimensions to Measure Groups 43

44 Completed Dimension Definitions 44

45 Types of Dimensions Fact dimensions: the Dimensions created from attributes in a fact table Parent-Child dimensions: Built on a table containing a self-referential relationship, such as a parent attribute. Role playing dimensions: related to the same measure group multiple times; each relationship represents a different role the dimension play; for example, time dimension plays three different roles: date of sale, data of shipment, and date of payment Reference dimensions: Not related directly to the measure group but to another regular dimension which in turn related to the measure group Data mining dimensions: the information discovered by data mining Many-to-many dimensions: e.g. multiple ship to addresses Slowly changing dimensions ◦ Type 1 SCD – no track ◦ Type 2 SCD – tracking the entire history, adding four attributes: SCD Original ID, SCD Start Date, SCD End Date, SCD Status ◦ Type 3 SCD – Similar to Type 2 SCD but only track current state and the original state; two additional attribute: SCD Start Date, SCD Initial Value 45

46 CUBE DESIGN TABS 46

47 Understanding the Cube Designer Tabs 47 Cube Structure: Use this tab to modify the architecture of a cube. Dimension Usage: Use this tab to define the relationships between dimensions and measure groups, and the granularity of each dimension within each measure group. Calculations: Use this tab to examine calculations that are defined for the cube, to define new calculations for the whole cube or for a subcube, to reorder existing calculations, and to debug calculations step by step by using breakpoints. KPIs: Use this tab to create, edit, and modify the Key Performance Indicators (KPIs) in a cube. Actions: Use this tab to create or modify drillthrough, reporting, and other actions for the selected cube.. Partitions: Use this tab to create and manage the partitions for a cube. Partitions let you store sections of a cube in different locations with different properties, such as aggregation definitions. Perspectives: Use this tab to create and manage the perspectives in a cube. A perspective is a defined subset of a cube, and is used to reduce the perceived complexity of a cube to the business user. Translations: Use this tab to create and manage translated names for cube objects, such as month or product names. Browser: Use this tab to view data in the cube. ISQS 6339, Data Mgmt & Business Intelligence

48 Key Performance Indicators (KPIs) 48 Digital dashboard Creating a KPI ISQS 6339, Data Mgmt & Business Intelligence

49 The MDX expression for KPI Status Expression (MaxMinManufacturingDM) 49 Case When ROUND([Measures].[percent Rejected],4) < 0.0103 Then 1 When ROUND([Measures].[percent Rejected],4) >= 0.0103 AND ROUND([Measures].[percent Rejected],4) >= 0.0104 Then.5 When ROUND([Measures].[percent Rejected],4) >= 0.0104 AND ROUND([Measures].[percent Rejected],4) >= 0.0105 Then 0 When ROUND([Measures].[percent Rejected],4) >= 0.0105 AND ROUND([Measures].[percent Rejected],4) >= 0.0106 Then -.5 Else -1 End ISQS 6339, Data Mgmt & Business Intelligence

50 50 Calculated measure

51 51 KPI definition and deployment

52 KPI Browser 52 Browser View ISQS 6339, Data Mgmt & Business Intelligence

53 Actions 53 Instructions stored inside the cube Allow the OLAP cubes to “reach out and touch someone.” Enable us to define commands, statements, and directives that are to be executed outside of the cube Linked to certain objects in the cube, which can be enacted as a menu when a user is browsing the objects. The user can select one of the these actions to accomplish certain tasks. ISQS 6339, Data Mgmt & Business Intelligence

54 Types of Actions 54 Action ◦ Dataset ◦ Proprietary ◦ Rowset - Retrieve a rowset. ◦ Statement ◦ URL Drillthrough Action. Defines a dataset to be returned as a drillthrough to a more detailed level Report Action. Launch a SQL Server 2005 Reporting Services report ISQS 6339, Data Mgmt & Business Intelligence

55 55 Defining Actions

56 56 Perspectives

57 57 Translations

58 Q & A Conceptual level ◦ What are rationale behind the structure of “Data Source”, “Data Source View” and “Cube”? ◦ Why time dimension is so important in a data mart? ◦ Why is the multi-levels of dimensions, such as Material-MachineType-Machine in MaxMinManufacturingDM, useful? ◦ Why do you need to change the primary key of DimTime after it was created from the MaxMinManufacturingFact table? ◦ Can you summarize a number of main differences between a regular database design and a data mart design? Technical level ◦ After you made changes in a data source node why do you have to check “Mapping” in the data destination node again? ◦ When there is a red wave line under an object, such as a table during cube design, what does it imply? How to solve it? Specifically, when a fact table has such a problem how could it be fixed? ◦ Why not all dimensions appear in the cube structure diagram? ◦ What is the difference between the variable names in the format of Name and [Name]? ◦ Do you understand the parameters configured in the data flow tasks, such as those in data sources, data destination, Aggregate node, Derived Column node, etc? Any other questions? 58

59 Data Mart Application Development Debugging Problem 0: You cannot find your database entry. Problem 1: The source node is red after running a data flow task ◦ Causes? Problem 2: The destination node is red after running a data flow task ◦ Causes? Problem 3: Even though you redefined the source node, the problem remains. Open problems ◦ What are frequently encountered problems in ETL application implementation? ◦ What are the problems you encountered in building a cube? 59


Download ppt "ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1."

Similar presentations


Ads by Google