Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.

Similar presentations


Presentation on theme: "Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing."— Presentation transcript:

1 Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing (DW) Week 7 Validating DW Data

2 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 2 Issues in DW Data Data validation techniques  Validating data in a Dimension table  Validating data in a Fact table Outline

3 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 3 The DW schema differs from the OLTP schema The DW tables, their PKs, FKs, and data and data types also differ! Issues in DW Data Normalized OLTP SchemaDW Schema

4 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 4 You designed the right DW schema.. Good! You can check if your dimension and fact tables are ok by checking if there is:  No problems with PKs (Surrogate with auto-increment)  No problems with FKs (link to dimension tables, especially dimTime )  No problems with Data types Issues in DW Data That’s GOOD

5 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 5 You know… ETL (Extract-Transform-Load) Issues in DW Data Northwind1 (for America) Northwind2 (for Europe) OLTP Databases ETL OLAP Database (for all counties) What if your ETL has “bugs” ?? ?

6 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 6 How do you know that your DW contains Correct and Complete data ??  Correctness  No wrong data, miscalculated, mistyped, etc.  Completeness  No missing data Recall that we have tried to check DW data by comparing number of rows  Select count(*) ….. Issues in DW Data This is still not enough  1 2 Bugs 

7 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 7 Our one simple goal  Identify if any record in the DW is WRONG Validation techniques  Do random checking  For example, pick any order to check if it has correct product, customer, etc..  Simple method and normally done “Manually”  Sometimes, we miss errors (as we do random checking)  Do complete checking  By writing SQL to do the job… hard job at the first time  However it guarantees 100% that you will not miss any error DW Data validation

8 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 8 We aim at NO ERRORS at all And…. we must design SQL to do the job for us!!  Sometimes, or several times, SQL can be complex  For example,  How to check if every customer in dimCustomers has a correct CustomerID DW Data validation SQL Question Write SQL statement to list all customers that have wrong CustomerID.

9 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 9 Does each dimension table contains correct and complete data from its source table? Especially if a dimension is de-normalized!! DW Data validation: Dimension tables ? Each product must correctly belong to its category!!

10 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 10 DW Data validation : Dimension tables SQL Answer SELECT * FROM dimCustomers dc WHEREdc.CustomerID NOT IN ( SELECTCustomerID FROMnorthwind1.dbo.Customers c, WHEREdc.CustomerID=c.CustomerID UNION SELECTCustomerID FROMnorthwind2.dbo.Customers c, WHEREdc.CustomerID=c.CustomerID ) This question requires a Sub-query SQL Question Write SQL statement to list all customers that have wrong CustomerID. This query will return NOTHING if all customers have their correct CustomerID

11 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 11 DW Data validation : Dimension tables SELECT * FROM dimCustomers dc WHEREdc.CustomerID NOT IN ( SELECTCustomerID FROMnorthwind1.dbo.Customers c, WHEREdc.CustomerID=c.CustomerID UNION SELECTCustomerID FROMnorthwind2.dbo.Customers c, WHEREdc.CustomerID=c.CustomerID ) This query will return that customer (1) who has wrong CustomerID !! Try to update any customer in DW with wrong CustomerID Then run this SQL again… to check if it can detect the error!! Let’s try it SELECT * FROM dimCustomers WHERE CustomerKey=1;-- To see the original CustomerID UPDATE dimCustomers SET CustomerID=‘XYZ’ WHERE CustomerKey=1;-- Now change it! 1 2

12 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 12 You need to understand how to write sub-queries in the SELECT clause  To check if all data in DW are correct  Correctness  To check if data in DW are NOT missing  Completeness Some highly recommended links on the Moodle  Please follow them and do your own study! DW Data validation : Dimension tables A Sub-query is very important for data validation!!! 1 2

13 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 13 Now, try a more difficult one,  How to check if every product in dimProducts has a correct category DW Data validation : Dimension tables SQL Question Write SQL statement to list all products that belong to wrong categories.

14 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 14 DW Data validation : Dimension tables SQL Question Write SQL statement to list all products that belong to wrong categories. SQL Answer SELECT * FROM dimProducts dp WHEREdp.CategoryName NOT IN ( SELECTCategoryName FROMnorthwind1.dbo.Products p, northwind1.dbo.Categories c WHEREp.CategoryID=c.CategoryID AND dp.ProductID=p.ProductID UNION SELECTCategoryName FROMnorthwind2.dbo.Products p, northwind2.dbo.Categories c WHEREp.CategoryID=c.CategoryID AND dp.ProductID=p.ProductID ) This question requires a Sub-query + JOIN!! This query will return NOTHING if all products have their correct CategoryName

15 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 15 DW Data validation : Dimension tables SELECT * FROM dimProducts dp WHEREdp.CategoryName NOT IN ( SELECTCategoryName FROMnorthwind1.dbo.Products p, northwind1.dbo.Categories c WHEREp.CategoryID=c.CategoryID AND dp.ProductID=p.ProductID UNION SELECTCategoryName FROMnorthwind2.dbo.Products p, northwind2.dbo.Categories c WHEREp.CategoryID=c.CategoryID AND dp.ProductID=p.ProductID ) This query will return all products which have wrong CategoryName !! Try to update any product in DW with wrong CategoryName Then run this SQL again… to check if it can detect the error!! Let’s try it

16 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 16 A fact table is always de-normalized…. So, you can expect that the SQL is going to be complicated  A fact table has a huge amount of data that come from a super “ BIG JOIN ” by using MERGE command Think about you are required to find a missing flight in the Indian Ocean !! DW Data validation : Fact tables

17 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 17 The fact table contains correct and complete data from its source table?  Correct FKs and PK  Correct Attribute Data DW Data validation: Fact tables Each Order must have correct Product, Customer, OrderDate and RequiredDate Each Order must have correct OrderID, UnitPrice, Qty, Discount Each Order must have correct TotalPrice 1 2

18 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 18 Validating all FKs and the PK  Validate the fact data against its original data in OLTP. For example,  Customers: Check if all orders correctly belong to their customer  Products: Check if all orders correctly contains their right product  Time: Check if all orders correctly have right OrderDate and RequiredDate DW Data validation: Fact tables 1

19 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 19 Validating Attribute Data  Validate each attribute (+pre-calculated attribute)  OrderID : From the Order and [ Order Details ] tables  UnitPrice: From the [ Order Details ] table  Qty: From the [ Order Details ] table  Discount : From the [ Order Details ] table  TotalPrice : By Quantity x UnitPrice from the [ Order Details ] table DW Data validation: Fact tables 2

20 ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 20 Validating data in each dimension table…. Validating data in a fact table Let’s try it now.. Have fun! DW Data validation : Summary


Download ppt "Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing."

Similar presentations


Ads by Google