Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zhangxi Lin Texas Tech University

Similar presentations


Presentation on theme: "Zhangxi Lin Texas Tech University"— Presentation transcript:

1 Zhangxi Lin Texas Tech University
ISQS 3358, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University

2 Outline Measures Where we’ve been Populating fact table
Types of dimensions

3 Structure and Components of Business Intelligence
SSMS SSIS SSAS SSRS SAS EG SAS EM

4 Snowflake Schema of the Data Mart
Manufacturingfact DimBatch 10 9 DimMachine 8 DimProduct 3 DimMachineType DimPlant 7 5 DimProductSubType 2 DimMaterial DimCountry 6 4 DimProductType 1

5 Where we’ve been and where we are now
Exercise 1: Getting started Exercise 2: Creating a data mart with SSMS Exercise 3: Import tables from a database Exercise 4: Populating dimensions of a data mart Exercise 5: Loading fact tables Exercise 6: Create and customize a cube

6 What we need to do with the half-done data mart?
Populate DimBatch dimenstion table Populate ManufacturingFact table Build an OLAP cube (we already did this before) Check measures Check dimensions

7 LOADING FACT TABLES

8 Facts Facts are measurements associated with a specific business process. Many facts can be derived from other facts, including additive and semiadditive facts. Non-additive facts can be avoided by calculating it from additive facts. Measures are clustered together in a group, called measure group.

9 Types of measures Three types Aggregate functions
Additive measures. Most facts are additive (calculative), such as sum Semiadditive measures. The measures that can be added along some dimensions, but not along others. For example, inventory level can be added along product dimension but not time dimension. Non-additive (such as max, average), or descriptive (e.g. factless fact table). Aggregate functions Additive: Sum Semiadditive: ByAccount, Count, FirstChild, FirstNonEmpty, LastChild, LastNonEmpty, Max, Min Nonadditive: DistinctCount, None.

10 Measures and dimensions
Dimensions are used to aggregate measures. Therefore, they must be somehow related to measures Granularity Important for the analysis There could be missing values in the fact table

11 Exercise 5: Loading a Fact Table

12 Exercise 5: Loading Fact Tables
Project name: Use the same project with a new Package Package name: FactLoad.dtsx Tasks Load Dim Batch Load Manufacturing Fact Deliverable: submit a screenshot of the “green” outcome of the ETL project via the Blackboard system. If you did not keep the “Green” face of the debugging, you can write a note telling that the data is loaded already.

13 Data Sources for Loading Fact
For loading DimBatch table and ManufacturingFact table BatchInfo.CSV in network folder ~\Downloaded data\Batchinfor CSV File\

14 Control Flow for Loading Facts and the Remaining Dimension
Note: to ease debugging, you may use separate packages and test them one by one, instead of doing everything in one package

15 Flat File Connection Data types BatchNumber, MachinNumber: four-byte signed integer [DT_I4] ProductCode, NumberProduced, NumberRejected: four-byte signed integer [DT_I4] TimeStarted, TimeStopped: database timestamp [DT_DBTimeStamp] Only check BatchNumber as the input of Dim Batch All columns are needed for fact tables

16 Some Frequently Used Nodes

17 Load DimBatch Data Flow
Note: 1. Because of duplication in the source file, we insert An Aggregate item after the Flat File Source item. 2. The above assumes that there is good match of data types Between source and destination. If not, a data conversion node Is needed.

18 Data Flow for Loading DimBatch
Derived Column can duplicate the column but can not change its data type. Use Data Conversion node to do this job. Question? Why in the previous instruction we did not use conversion? Because we let the destination table to adapt to the input data type.

19 DimBatch loading Input Output

20 The Flat File Source

21 Tasks: Need to define BatchNumber as an integer to match the destination type INT Need to create new column BatchName, which is a non-unicode string – How?

22 Sort Transformation In the Aggregate item, Define “Group-by” BatchNumber. In Derived column item, Define BatchName From BatchNumber Use the expression (DT_WSTR, 50)[BatchNumber] To change the data type Of BatchName.

23 Load Fact Data Flow

24 Derived Columns for the Fact table

25 Expressions for the Derived Columns
AcceptedProducts [NumberProduced] – [NumberRejected] ElapsedTimeForManufacture DATEDIFF(“mi”, [TimeStarted],[TimeStopped]) DateOfManufacture (DT_DBTIMESTAMP)SUBSTRING((DT_WSTR,25)[TimeS tarted],1,10) This expression converts TimeStarted into a string and selects the first ten characters of that string. This string is then converted back into a date time, without the time portion.

26 OLE DB Destination For loading the fact table

27 Debugging Results Loading DimBatch Loading ManufacturingFact
Note: There could be a data conversion node before destination node

28 Exercise 6: Building an OLAP Cube

29 Exercise 6: Building an OLAP Cube
This is a SSAS project Project name: ISQS3358-EX6-2016f-lastname Tasks Add in new date items (year, quarter, and month) to two fact tables Create time dimension using Manufacturing Fact table Define calculated measures (Total Products, Percent Rejected) Define hierarchies of attributes in dimension tables Create a cube from the MaxMinManufacturing data mart with hierarchical date dimension Deliverable: Screenshots: dimension hierarchies, dimensions, relationships of facts and dimensions, deployment result, format of measures, and browsing results. 29

30 Three Steps to Create a Cube from Data Sources
Defining data source Defining data source view Add in three new columns of year, quarter, and month for the two fact tables Building a cube. Define a new dimension Dim Time from Manufacturing Fact table Customize the cube: Link two fact tables in a cube Define new primary key for Dim Time Define calculated measures Relate dimensions to measures

31 T-SQL Expressions for DS View Definition - Manufacture
YearOfManufacture CONVERT(char(4),YEAR(DateOfManufacture)) QuarterOfManufacture CONVERT(char(4), YEAR(DateOfManufacture)) + CASE WHEN MONTH (DateOfManufacture) BETWEEN 1 AND 3 THEN 'Q1' WHEN MONTH (DateOfManufacture) BETWEEN 4 AND 6 THEN 'Q2' WHEN MONTH (DateOfManufacture) BETWEEN 7 AND 9 THEN 'Q3' ELSE 'Q4' END MonthOfManufacture CONVERT(char(4), YEAR(DateOfManufacture)) + RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfManufacture)),2)

32 Data Source View We don’t do this In this semester New columns

33 Select Measures Page Uncheck Manufacture Fact Count

34 The finished cube

35 We don’t do this table Defining a format string

36 Calculated measures – made-up facts
The definition of calculated measure is stored in the OLAP cube itself. The actual values that result from a calculated measure are not calculated, however, until a query containing that calculated measure is executed. The results of that calculation are then cached in the cube. The cached value is then delivered to any subsequent users requesting the same calculation. The expressions of calculation are created using a language known as Multidimensional Expression Language (MDX) script. MDX is different from T-SQL. It is a special language with features designed to handle the advanced mathematics and formulas required by OLAP analysis. This is not found in T-SQL. 36

37

38

39 Configure DIMENSIONS in SQL Server

40 Types of Dimensions Fact dimensions: the Dimensions created from attributes in a fact table Parent-Child dimensions: Built on a table containing a self- referential relationship, such as a parent attribute. Role playing dimensions: related to the same measure group multiple times; each relationship represents a different role the dimension play; for example, time dimension plays three different roles: date of sale, data of shipment, and date of payment. To create a role playing dimension, add the dimension to the Dimension Usage tab multiple times. Then create a relationship between each instance of the dimension and the measure group. Reference dimensions: Not related directly to the measure group but to another regular dimension which in turn related to the measure group Data mining dimensions: the information discovered by data mining Many-to-many dimensions: e.g. multiple ship to addresses Slowly changing dimensions 40

41 Slowly changing dimensions
Type 1 SCD – no track Type 2 SCD – tracking the entire history, adding four attributes: SCD Original ID, SCD Start Date, SCD End Date, SCD Status Type 3 SCD – Similar to Type 2 SCD but only track current state and the original state; two additional attribute: SCD Start Date, SCD Initial Value

42 Add a time dimension (a fact dimension)

43

44 Rename time dimension

45 Date Hierarchy

46

47 Material Hierarchy & Plant Hierarchy

48 Product Hierarchy

49 Relating Dimensions in the Cube


Download ppt "Zhangxi Lin Texas Tech University"

Similar presentations


Ads by Google