Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.

Similar presentations


Presentation on theme: "Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing."— Presentation transcript:

1 Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing (DW) Week 10 Other topics in DW

2 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 2 Advanced Dimensional modeling  Slowly-Changing Dimensions  Data Hierarchy Physical Database Design OLAP Cubes and Operations Outline

3 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 3 Slowly Changing Dimension ( SCD) is a dimension that changes slowly over time, rather than changing on regular schedule, time-base.  Need to track changes in dimension attributes in order to report historical data  For example, how will you deal with a customer dimension data if a customer changes an address from New Zealand to Australia ? Slowly Changing Dimension (SCD) CustomerKeyCustomerIDNameCountry 1John1John New Zealand CustomerKeyCustomerIDNameCountry 1John1John Australia

4 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 4 SCD has 6 types  Type 0 - The passive method  Type 1 - Overwriting the old value  Type 2 - Creating a new additional record  Type 3 - Adding a new column  Type 4 - Using historical table  Type 6 - Combine approaches of types 1,2,3 ( 1+2+3=6 ) Slowly Changing Dimension (SCD) This is why no Type 5

5 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 5 SCD Type 0 - The passive method  No special action performed upon dimensional changes  Some dimension data can remain the same as it was first time inserted, others may be overwritten. SCD Type 0

6 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 6 SCD Type 1 - Overwriting the old value  NO history of dimension changes is kept in the database  The old dimension value is simply overwritten be the new one.  Easy to maintain and is often use for data which changes are caused by processing corrections (e.g., miss spelling) SCD Type 1 CustomerKeyCustomerIDNameCountry 1John1John New Sealand CustomerKeyCustomerIDNameCountry 1John1John New Zealand

7 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 7 SCD Type 2 - Creating a new additional record  All history of dimension changes is kept in the database  Attribute change captured by adding a new row with a new surrogate key to the dimension table  Also 'effective date ' and 'current indicator ' columns are used SCD Type 2 CustomerKeyCustomerIDNameCountryStartDateEndDateFlag 1John1JohnNew Zealand01/01/201431/01/2014Y CustomerKeyCustomerIDNameCountryStartDateEndDateFlag 1John1JohnNew Zealand01/01/201431/12/2014N 2John1JohnAustralia01/01/201531/12/2015Y

8 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 8 SCD Type 3 – Adding a new column  Only the current and previous value of dimension is kept in the database  New value loaded into 'current ' column and the old one into 'previous ' column  History is limited to the number of columns created for storing historical data  The least commonly used technique SCD Type 3 CustomerKeyCustomerIDNameCurrent Country Previous Country 1John1John New Zealand CustomerKeyCustomerIDNameCurrent Country Previous Country 1John1John AustraliaNew Zealand

9 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 9 SCD Type 4 – Using Historical Table  A separate historical table is used to track all dimension's attribute historical changes for each of the dimension  The 'main ' dimension table keeps only the current data SCD Type 4 CustomerKeyCustomerIDNameCountry 1John1John Australia CustomerKeyCustomerIDNameCountryStartDateEndDate 1John1JohnNew Zealand01/01/201431/12/2014 1John1JohnAustralia01/01/201531/12/2015 Main table Historical table

10 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 10 SCD Type 6 – Combine approaches of types 1,2,3  Type 1 = Overwrite the old value  Type 2 = Add a new record  Type 3 = Add a new column SCD Type 6 CustomerKeyCustomerIDNameCurrent Country Historical Country StartDateEndDateFlag 1John1John New Zealand 01/01/201431/12/2014N 2John1JohnAustraliaNew Zealand01/01/201531/12/2015Y

11 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 11 Date (Time) dimensions Location dimensions Product dimensions Data Hierarchy Region Country City Category Product Type Product Quarter Year Month Week Day

12 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 12 Dimensional modeling  Dimensions  De-normalized (Star) or Normalized (Snowflake and Fact Constellation)  Slowly Changing Dimension (6 types)  Some interesting tutorial  http://www.youtube.com/watch?v=Eam2SmYgIzg http://www.youtube.com/watch?v=Eam2SmYgIzg  Data Hierarchy Dimension Modeling : A Summary

13 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 13 Database tables, indexes, partitions, summary tables  Table design  Dimension tables and Fact tables  PKs, FKs, surrogate/natural keys, and constraints  Partition design  Sort and group data into different partitions  Help speed up query and improve scalability!  Index design  To speed up query!! Physical Database Design Why do we have to care much about query’s “SPEED” in DW ??

14 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 14 Partitioning  Split a table into several smaller tables  Partitions can be stored in a single database, or multiple databases  Improve scalability (when storing data) and performance (when storing and querying data)  Think about the “ Fact Table ” that contains 1 billion data records ! Approaches  Vertical partitioning  Each small table contains some columns of the original table  Horizontal partitioning  Each small table contains some rows of the original table Partition Design

15 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 15 Vertical partitioning  Each small table contains some columns of the original table Partition Design

16 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 16 Horizontal partitioning  Each small table contains some rows of the original table Partition Design Which partitioning approach (Horizontal or Vertical) best helps DW database? ?

17 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 17 Index is useful and speed up processing  When a column is used in “ searching/matching ”  Country  is used for searching in the WHERE clause  So, indexing the “ Country ” will make the query processes faster! Index Design SELECT * FROM Customers WHERE Country =“New Zealand”

18 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 18 What about Queries in OLAP database  \ Index Design SELECT p.ProductName, sum(f.TotalPrice) as [Total Revenue] FROM dimProducts p, dimCustomers c, factOrders f, dimTime t WHERE f.ProductKey=p.ProductKey AND f.CustomerKey=c.CustomerKey AND f.OrderDateKey=t.TimeKey AND c.Country ='UK' AND t.QuaterOfYear = 1 AND t.Year in (1996,1997,1998) GROUP BY p.ProductName ORDER BY p.ProductName From the above query, which column should be “indexed” ? ?

19 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 19 OLAP Cubes

20 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 20 An array of data understood in terms of its 0 or more dimensions. You can make an OLAP Cube from any DW schema For example,  A star schema with 5 dimensions to a cube with 3 dimensions OLAP Cubes *from http://visibledata.wordpress.com/data/datacloud/datacube/

21 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 21 What a cube represents?  Dimensions  Data cell = The fact that relates to all dimensions OLAP Cubes

22 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 22 With a cube, you can…  Slice  Dice  Drill Down  Roll up  Pivot Cubes Operations

23 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 23 Slice operation  To create a rectangular subset of a cube with a fewer dimension by choosing a single value for one of its dimensions  Number of dimensions is reduced by one  E.g., from 3 dimensions to 2 Cubes Operations : Slice From http://www.tutorialspoint.com/dwh/dwh_olap.htm

24 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 24 Dice Operation  To produce a subcube by allowing the analyst to pick specific values of multiple dimensions  No dimension is reduced Cubes Operations : Dice From http://www.tutorialspoint.com/dwh/dwh_olap.htm

25 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 25 Drill Down Operation  To navigate among levels of data ranging from the summarized to the more detailed  No dimension is reduced Cubes Operations : Drill Down From http://www.tutorialspoint.com/dwh/dwh_olap.htm

26 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 26 Rollup Operation  To summarize the data along a dimension (by aggregation)  Similar to Group by Cubes Operations : Roll up From http://www.tutorialspoint.com/dwh/dwh_olap.htm

27 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 27 Pivot Operation  To rotate the cube in space to see its various faces Cubes Operations : Pivot From http://www.tutorialspoint.com/dwh/dwh_olap.htm

28 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 28 Submission  Both Phase-1 and Phase-2 (separate submissions)  Due Monday 26 October at 9:30am  Next week has a workshop  Marking on Phase-2 NOT rely on DQLog produced from Phase-1 Interview sessions  Will be conducted case by case ( not all students are required)  Maximum penalty for “ cheating ”  A friendly reminder “Assignment 2”

29 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 29 Assignment 2 Q/A Continue working on worksheets  Last chance to work and submit… What’s next?

30 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 30 Questions?


Download ppt "Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing."

Similar presentations


Ads by Google