McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Data Warehouse: additional slides Source: Michael V. Mannino, Database: Design, Application Development & Administration, Third Edition, McGraw Hill, 2007
16-2 Data Comparison
16-3 Applications
16-4 Example: Star Schema
16-5 Example: Input table records Sales Table SalesNoSalesDollorTimeNoStoreId TimeDim Table TimeNoTimeMonthTimeYear Store Table StoreIdStoreStateStoreNationStoreZip 1001MNUSA MNUSA MNUSA OHUSA OHUSA OHUSA80112
16-6 CUBE Operator Example SELECT StoreZip, TimeMonth, SUM(SalesDollar) AS SumSales FROM Sales, Store, TimeDim WHERE Sales.StoreId = Store.StoreId AND Sales.TimeNo = TimeDim.TimeNo AND (StoreNation = 'USA' OR StoreNation = 'Canada') AND TimeYear = 2005 GROUP BY CUBE (StoreZip, TimeMonth)
16-7 CUBE Operator Example Output of query with CUBE operator StoreZipTimeMonthSumSales
16-8 ROLLUP Operator Example SELECT StoreZip, TimeMonth, SUM(SalesDollar) AS SumSales FROM Sales, Store, TimeDim WHERE Sales.StoreId = Store.StoreId AND Sales.TimeNo = TimeDim.TimeNo AND (StoreNation = 'USA' OR StoreNation = 'Canada') AND TimeYear = 2005 GROUP BY ROLLUP (StoreZip, TimeMonth);
16-9 ROLLUP Operator Example Output of query with ROLLUP operator StoreZipTimeMonthSumSales
16-10 GROUPING SETS Example SELECT StoreZip, TimeMonth, SUM(SalesDollar) AS SumSales FROM Sales, Store, Time WHERE Sales.StoreId = Store.StoreId AND Sales.TimeNo = Time.TimeNo AND (StoreNation = 'USA' OR StoreNation = 'Canada') AND TimeYear = 2005 GROUP BY GROUPING SETS((StoreZip, TimeMonth), StoreZip, TimeMonth, ());
16-11 GROUPING SETS Example Output of query with ROLLUP operator StoreZipTimeMonthSumSales
16-12 ROLAP Techniques Bitmap join indexes Star join optimization Query rewriting Summary storage advisors Parallel query execution
16-13 Maintenance Workflow
16-14 Data Quality Problems Multiple identifiers Multiple field names Different units Missing values Orphaned values Multipurpose fields Conflicting data Different update times
16-15 ETL Tools Extraction, Transformation, and Loading Specification based Eliminate custom coding Third party and DBMS based tools
16-16 Refresh Processing