Presentation is loading. Please wait.

Presentation is loading. Please wait.

Contents of this slideshow: What is a data warehouse? Multi-dimensional data modeling.

Similar presentations


Presentation on theme: "Contents of this slideshow: What is a data warehouse? Multi-dimensional data modeling."— Presentation transcript:

1 Contents of this slideshow: What is a data warehouse? Multi-dimensional data modeling

2 An example of a Datawarehouse: A star shema datawarehouse has a central table (the Fact table) surrouded by dimension tables with on-to-many relationships towards the fact table. The fixed data base structure implies that application programs (drilling functions/aggregates) can be generated automatically!

3 Dimension hierarchies: A dimension hierarchy is a set of tables connected by one-to-many relationships towards the fact table: In a dimension hierarchiy it is possible to aggregate data from the fact table to the different levels of the hierachy. Drill-down = “de-aggregate” = break an aggregate into its constituents. Roll-up = aggregate along one or more dimensions.

4 Two different types of drilling: Drilling in dimension hierarchies. Drilling between dimensions.

5 Which star schemas or data marts can be build by using the illustrated integrated E-commerce/ ERP data model? Which star schema would you recommend to be implemented first?

6 Data marts = Kimball uses the word for any multidimensional database/star schema. A galaxy is a set of multidimensional databases with conformed (fælles tilpassede) dimensions: The value chain Suppose an entreprise has a datamart for Purchase and another datamart for Sale as illustrated above. Is it possible to calculate the revenue per month for the last year by using such a galaxy?

7 Conceptual Modeling of Data Warehouses –Star schema: A fact table in the middle connected to a set of dimension tables –Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake –Galaxy schema: Multiple fact tables share dimension tables (Conformed dimensions), viewed as a collection of stars, therefore called galaxy schema or fact constellation

8 Kimball’s Data Warehouse Bus Architecture = An architecture for designing all the data marts of an enterprice by using conformed dimension and conformed fact tables. _________________________________________________________ Data marts = Kimball uses the word for any multidimensional database. Conformed dimensions = dimensions designed to be common for different data marts in order to make drill across operations possible. Conformed facts = measures with common units of measurement and granularities that make it possible to integrate measures from different fact tables.

9 Conformed dimensions = dimensions designed to be common for different data marts in order to make drill across operations possible. Conformed facts = measures with common units of measurement and granularities that make it possible to integrate measures from different fact tables. The value chain Is it possible to calculate the revenue per month for the last year if the datamart for Purchase and the datamart for Sale do not have conformed dimensions or facts?

10 Contents of this slideshow: What is a datawarehouse? Multi-dimensional data modelling

11 Datawarehouse aggregating to the product level: SELECT Product#, SUM(Qty*Price) AS omsætning FROM Orderdetails JOIN Products GROUP BY Product#

12 Drill down to the Product per Salesman level: SELECT Product#, Salesman#, SUM(Qty*Price) AS omsætning FROM Orderdetails JOIN Products JOIN Salesmen GROUP BY Product#, Salesman#; Where should the Price be stored?

13 Dimension hierarchies: A dimension hierarchi is a set of tables connected by one-to-many relationships towards the fact table: A Snowflake schema may in contrast to star schemas have dimension hierarchies. Describe advantage and disadvantage by using dimension hierarchies/Snowflake schema?

14 Snowflake schema with branches: A Snowflake schema may have branches in the dimension hierarchies: Are Customers related to the regions?

15 The aggregation level is the argument to the GROUP BY statement. x1x1 x2x2 …xnxn Aggregated dataNon-aggregated data Salesman#ProductnameTurnoverBranch-office# SmithScrew10,000LA SmithBolt30,000LA SmithNut60,000LA JonesScrew20,000SF JonesNut40,000SF...

16 Drilling in dimension hierarchies: Branch-office#Turnover LA400,000 SF200,000 Salesman#TurnoverBranch-office# Smith100,000LA Jones300,000LA Adams200,000SF

17 Drilling between dimension hierarchies: Salesman#Turn- over Branch- office# Smith100,000LA Jones300,000LA Adams200,000SF Sales man# Product- name Turn- over Branch- office# SmithScrew10,000LA SmithBolt30,000LA SmithNut60,000LA JonesScrew20,000SF JonesNut40,000SF...

18 Roll up to the top level: Roll up can be executed by removing one or more argument to the GROUP BY statement. Sales man# Product- name Turn- over Branch- office# SmithScrew10,000LA SmithBolt30,000LA SmithNut60,000LA JonesScrew20,000SF JonesNut40,000SF... ProductnameTurnover Screw100.000 Bolt200.000 Nut300,000 Roll up to the product level. Top levelTurnover 600.000 Roll up to the top level.

19 Examples of transaction grain fact data and snapshot grain fact data: Types of measure: Transaction grain State grain collected at the end of a period of time

20 Non-linear dimensions as e.g. the Date Dimension: The granularity is day. Many different hierarchies. Two major problems: –Calender Week do not aggregate to year. –Type of Day distinguish between working day and holiday. However, they are idependent of the other dimensions (e.g. Easter). Day of Week Type of Day Fiscal Week Fiscal Month Fiscal Quarter Fiscal Year Day Calendar Month Calendar Quarter Calendar Year Calendar Week What aggregation level would you use to calculate the average sale in non- hollyday mondays per month?

21 The time dimension: The granularity is minute. The top level is a hole day. Minute Hour Day Part AM/PM Flag Why do you think Kimball recommends to separate the date and time dimensions?

22 Degenerated dimension = A dimension that is not created because nobody want to aggregate data to the degenerated level. Example: The Order dimension should be deleted while the Time and Customer attributes should be created as new dimensions to which it is meaningful to aggregate data.

23 Exercise: The figure illustrates an ER-diagram of a car rental company like Hertz or Avis. Design a snowflake shema, star shema or Galaxy for the car rental company!

24 Major problems in data warehouse design: Drilling in many-to-many relationships and tree structures. Inconsistensies caused by ”slowly changing dimensions”.

25 Slowly Changing Dimensions (SCD) If the attributes of a dimension is dynamic (e.i. they may be updated) we say that they are slowly changing. May the Branch-size of a Branch-office change after e.g. a renovation? May the Branch-name of a Branch-office change?

26 Exercise in SCD: Soppose the attribute Branch-size is dynamic and aggregations is made to the levels (Branch-size, Year) or (Branch-size, Month). Does this aggregation make sense and how would you solve possible problems?

27 Exercise: Is the region of the customer a dynamic attribute of the customer? Does it make sense to aggregate the rental revenue to the region of the customers?

28 It is possible to cheat the application generator. That is, special very complicated data structures may function as many-to-many or networt relationships when they are dealt with as 1-to-many relationships. How would you recommend to design a datawarehouse where it is possible to aggregate Sale to the Stock locations used for the sale?

29 Exercise. Design a datawarehouse for a travel agency.

30 Design a data warehouse (or galaxy) for an ERP system with as many meaningful dimensions as possible:

31 End of session Thank you !!!

32 Response type Evaluation criteria Is historical information preserved Aggregation performanceStorage consumption Response 1 where dimension records are overwritten NoIn the evaluation, we define this solution to have average performance Only the current dimension record version is stored. No redundant data is stored Response 2 where new versions are created YesVersion records makes performance slower proportional to the number of changes All old versions of dimension records are stored often with redundant attributes Response 3 where only one historical version is saved The current version and a single history destroying version are saved No performance degradation occurs if either the current or the historical version are used in a query Normally, only a single extra attribute version is stored Response 4 that use the top of a dynamic dimen-sion hierarchy as a new static dimension YesBetter or worse depen-ding on whether both dimension tables are used in a query The relatively large fact table must have an extra foreign key attribute Response 5 with dimension data as fact data YesBetter or worse depen-ding on whether the new fact data are used in a query The relatively large fact table must have an extra attribute for each dynamic dimension attribute Response 6 that use fine granularity in combination with response 1 or 3 The finer the granularity, the more historical state information is preserved The finer the granularity, the slower the performance The finer the granularity, the more storage consumption Response 7 that stores dynamic dimension data as static facts in another data mart YesBetter or worse depen-ding on whether both fact tables are used in a drill across query This is the most storage consuming solution as at least a new fact and foreign key are stored in the new fact table Table 1 [S1] [S1] [S1] Prøv evt. At få tabelnavnet op… Denne side er helt blank[S1]

33 Where do the responses of SCDs store historic information? Response 1 does not store historic information. Response 2 store historic information in a new record version. Response 3 store at one historic value in a new dimension attribute. Response 4 store historic information in a new dimension relationship. Response 5 store historic information in a new fact attribute. Response 6 can sometimes deminish the aggregation error of response 1 as finer granularity in a state fact more acurately can be related to the right dimension record. Response 7 store historic information in a new fact table.


Download ppt "Contents of this slideshow: What is a data warehouse? Multi-dimensional data modeling."

Similar presentations


Ads by Google