Presentation is loading. Please wait.

Presentation is loading. Please wait.

Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema.

Similar presentations


Presentation on theme: "Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema."— Presentation transcript:

1

2 Populating Data Warehouse Structures

3 Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

4 Implementing the Star Schema 1. Extract Data From Multiple Sources 2. Integrate, Transform, and Restructure Data 3. Load Data Into Dimension Tables and Fact Tables

5 The Star Schema Data Load NorthwindOLTP Staging Area Polaris Data Warehouse Heterogeneous Data Sources ExternalFiles ExternalFiles InternalFiles InventoryStar SalesStar Extracting Data From Transforming Loading the Heterogeneous Sources Data Star Schema DTS Financial

6 Verifying the Dimension Source Data Verifying Accuracy of Source Data Integrating data from multiple sources Applying business rules Checking structural requirements Managing Invalid Data Rejecting invalid data Saving invalid data to a log Correcting Invalid Data Transforming data Reassigning data values

7 Dimension Data Load Examples:buyer_namebuyer_name Barr, Adam Chai, Sean OMelia, Erin... reg_idreg_id 2 2 4 4 6 6 buyer_firstbuyer_first Adam Sean Erin... buyer_lastbuyer_last Barr Chai OMelia... reg_idreg_id 2 2 4 4 6 6 DTS buyer_codebuyer_code A123 B456... buyer_lastbuyer_last Barr Chai OMelia... reg_idreg_id 2 2 4 4 6 6 buyer_codebuyer_code U999 A123 B456... buyer_lastbuyer_last Barr Chai OMelia... reg_idreg_id 2 2 4 4 6 6 buyer_namebuyer_name Barr, Adam Chai, Sean Smith, Jane Paper, Anne reg_idreg_id 2 2 4 4 2 2 4 4 DTS buyer_namebuyer_name Barr, Adam Chai, Sean reg_idreg_id II IV buyer_namebuyer_name Smith, Jane Paper, Anne reg_idreg_id 2 2 4 4

8 Maintaining Integrity of the Dimension Assigning a Surrogate Key to Each Record Defines the dimensions primary key Relates to the foreign key fields of the fact table Loading One Record Per Application Key Maintains uniqueness in the dimension Depends on how you manage changing dimension data Maintains integrity of the fact table

9 Managing Changing Dimension Data Dimensions with Changing Column Values Inserts of new data Updates of existing data Slowly-Changing Dimension Design Solutions Type 1: Overwrite the dimension record Type 2: Write another dimension record Type 3: Add attributes to the dimension record

10 Type 1: Overwriting the Dimension Slide Existing record is changed product key product name product size product package product dept product cat product subcat... product key product name product size product package product dept product cat product subcat... Product Dimension 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks... 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks... Before After 001 Rice Puffs 12 Oz Bag Grocery Dry Goods Snacks... 001 Rice Puffs 12 Oz Bag Grocery Dry Goods Snacks... 12 oz.

11 Type 2: Writing Another Dimension Record Adds a new record product key product name product size product package product dept product cat product subcat effective_date … product key product name product size product package product dept product cat product subcat effective_date … Product Dimension 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks 05-01-1995... 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks 05-01-1995... Before After 001 Rice Puffs 10 Oz Bag Grocery Dry Goods Snacks 05-01-1995... 001 Rice Puffs 10 Oz Bag Grocery Dry Goods Snacks 05-01-1995... 10 oz. 12 oz. Rice Puffs 12 Oz Bag Grocery Dry Goods Snacks 10-15-1998... Rice Puffs 12 Oz Bag Grocery Dry Goods Snacks 10-15-1998... 731

12 Type 3: Adding Attributes in the Dimension Record Additional information is stored in an existing record Product Dimension product key product name product size product package product dept product cat product subcat current product size date previous product size previous product size date 2nd previous product size 2nd previous product size date... product key product name product size product package product dept product cat product subcat current product size date previous product size previous product size date 2nd previous product size 2nd previous product size date... product size previous product size previous product size date Before 001 Rice Puffs 10 Oz Bag Grocery Dry Goods Snacks 05-01-1995 11 Oz 03-20-1994 (null)... 001 Rice Puffs 10 Oz Bag Grocery Dry Goods Snacks 05-01-1995 11 Oz 03-20-1994 (null)... 10 oz. 11 oz. 03-20-1994 After 001 Rice Puffs 12 oz. Bag Grocery Dry Goods Snacks 10-15-1998 10 oz. 05-01-1995 11 Oz 03-20-1994... 001 Rice Puffs 12 oz. Bag Grocery Dry Goods Snacks 10-15-1998 10 oz. 05-01-1995 11 Oz 03-20-1994... 12 oz 10-15-1998 11 oz. 03-20-1994 05-01-1995

13 Verifying the Fact Table Source Data Verifying Accuracy of Source Data Integrating data from multiple sources Applying business rules Checking structural requirements Managing Invalid Data Rejecting invalid data Saving invalid data to a log Correcting Invalid Data Transforming data Reassigning data values

14 Assigning Foreign Keys Dimension Tables customer_dimcustomer_dim 201 ALFI Alfreds product_dimproduct_dim 25 123 Chai Source Data customer id ALFI1231/1/2000400 134 1/1/2000 time_dimtime_dim product id order date quantity_sales amount_sales 10,7891231/1/200040010,789 cust_key 1231/1/2000400 prod_key time_key quantity_sales amount_sales 2513440010,789201 Sales Fact Data

15 Defining Measures Loading Measures from the Source System Calculating Additional Measures Source System Data Fact Table Datacustomer_idcustomer_id VINET ALFI HANAR... product_idproduct_id 9GZ 1KJ 0ZA... priceprice.55 1.10.98... qtyqty 32 48 9 9... customer_keycustomer_key 100 238 437... product_keyproduct_key 512 207 338... qtyqty 32 48 9 9... total_salestotal_sales 17.60 52.80 8.82...

16 Maintaining Data Integrity Adhering to the Fact Table Grain A fact table can only have one grain You must load a fact table with data at the same level of detail as defined by the grain Enforcing Column Constraints NOT NULL constraints FOREIGN KEY constraints

17 Implementing Staging Tables Centralize and Integrate Source Data Break Up Complex Data Transformations Facilitate Error Recovery Staging Area sales_stage inventory_stage market_stage shipments_stage

18 DTS Functionality Accessing Heterogeneous Data Sources Importing, Exporting, and Transforming Data Creating Reusable Transformations and Functions Automating Data Loads Managing Metadata Customizing and Extending Functionality

19 Defining DTS Packages Identifies Data Sources and Destinations Defines Tasks or Actions Implements Transformation Logic Defines Order of Operations

20 Identifying Package Components Connections Access Data Sources and Destinations Tasks Describe Data Transformations or Functions Steps Define the Order of Task Operations or Workflow Global Variables Store Data that Can Be Shared Across Tasks

21 Creating Packages Using the DTS Import / Export Wizard Perform ad-hoc table and data transfers Develop a prototype package Using DTS Package Designer Edit packages created with the DTS Import/Export Wizard Create packages with a wide range of functionality Programming DTS Applications Directly access the functionality of the DTS Object Model Requires Microsoft Visual Basic or Microsoft Visual C++

22 Using DTS to Populate the Sales Star Populating the Sales Star Dimensions Populating the Sales Star Fact Table

23 Populating the Sales Star Dimensions Product Tab Delimited Files NorthwindOLTP DTS time_dim customer_dim product_dim SQL Server Stored Procedure DTS

24 Populating the Sales Star Fact Table DTS sales_fact DTS sales_stage time_dimcustomer_dim product_dimsales_stage Sales Data File

25 Designing Modular Packages Creating Modular Packages Simplify complex workflows Create more readable packages Produce smaller packages that are easier to debug Using Outer Packages Execute multiple packages within a single package Combine modular packages into logical workflows Reuse modular packages in different workflows Execute packages in parallel

26 Using DTS to Populate the Sales Star


Download ppt "Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema."

Similar presentations


Ads by Google