Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transportation: Refreshing Warehouse Data Chapter 13.

Similar presentations


Presentation on theme: "Transportation: Refreshing Warehouse Data Chapter 13."— Presentation transcript:

1 Transportation: Refreshing Warehouse Data Chapter 13

2 Developing a Refresh Strategy for Capturing Changed Data Consider load window Identify data volumes Identify cycle Know the technical infrastructure Plan a staging area Determine how to detect changes Operational databases T1T2T3

3 User Requirements and Assistance Users define the refresh cycle IT balances requirements against technical issues Document all tasks and processes Employ user skills Operational databases T1T2T3

4 Load Window Time available for entire ETT process Plan Test Prove Monitor Load Window User Access Period Load Window 03am6912pm36912

5 Load Window Plan and build processes according to a strategy. Consider volumes of data. Identify technical infrastructure. Ensure currency of data. Consider user access requirements first High availability requirements may mean a small load window User Access Period 03am6912pm36912

6 Scheduling the Load Window ¬ Requirements Ë Load cycle File Names File types Number of files Number of loads First-time load or refresh Date of file Data range Records in file - counts Totals - amounts 3 4 Control File File 1 File 2 FTP Receive data Open and read files to verify and analyze Control process 03 am

7 Scheduling the Load Window Load into warehouse Verify, analyze, reapply Index data Create summaries Update metadata 5 6 7 8 9 File 1 File 2 Parallel load 9 am3 am 6 am

8 Scheduling the Load Window Back up warehouse Create Views for Specialized tools Users Access Summary data Publish 10 11 12 13 9 am 6 am User access

9 Capturing Changed Data for Refresh Capture new fact data Capture changed dimension data Determine method for capture of each Methods: - Wholesale data replacement - Comparison of database instances - Time stamping - Database triggers - Database log Hybird techniques

10 Wholesale Data Replacement Expensive Limited historical data, if any Data mart implementations Time period replacement Operational databases T1T2T3

11 Comparison of Database Instance Simple to perform, but expensive in time and processing Data file: - Changes to operational data since last refresh - Used by various techniques Yesterday’s Operational database Today’s Operational database Database comparison Delta file holds Changed data

12 Time and Date Stamping Fast scanning for records changed since last extraction Date Updated field No detection of deleted data Operational data Delta file holds Changed data

13 Database Triggers Changed data intersected at the server level Extra I/O required Maintenance overhead Operation Server (DBMS) Trigger

14 Using a Database Log Contains before and after images Requires system checkpoint Common technique Operational Server (DBMS) Log analysis And Data extraction Log Operational data Delta file holds Changed data

15 Verdict Consider each method on merit. Consider a hybrid approach if one approach is not suitable. Consider current technical, existing operational, and current application issues.

16 Applying the Changes to Data You have a choice of techniques: Overwrite a record Add a record Add a field Maintain history Add version numbers

17 Overwriting a Record Easy to implement Loses all history Not recommended Customer ID John Doe Single Customer ID John Doe Married

18 Adding a New Record History is preserved; dimensions grow. Time constraints are not required. Generalized key is created. Metadata tracks usage of keys. 1 Customer Id John Doe Single 1A Customer Id John Doe Married

19 Adding a Current Field Maintains some history Loses intermediate values Is enhanced by adding an Effective Date field Customer Id John Doe Single Customer Id John Doe Single Married 01-JAN-96

20 Limitations of Methods for Applying Changes Complete history impossible Dimensions may grow large Maintenance overload 1234 Comer 1 Main Street 555-6789 1234 Comer 200 First Ave 222-3211 1234 Comer 1 Main Street 555-6789 1234 Comer 1 Main Street 555-6789 01-Apr-93 1234-01 Comer 200 First Ave 222-3211 Effective Date 1234-01 Comer 200 First Ave 222-3212 01-Jun-97

21 Maintaining History One-to-many relationship Always retain current record Consistently able to refer to record history HIST_CUST CUSTOMER Sales Time Product

22 History Preserved History enables realistic analysis. History retains context of data. History provides for realistic historical analysis. - Reflect business changes - Maintain context between fact and dimension data - Retain sufficient data to relate old to new

23 Version Numbering Avoid double counting Facts hold version number Customer.CustId Version Customer Names 1234 1 Comer 1234 2 Comer Customer.CustId Version Sales Facts 1234 1 11,000 1234 2 12,000 Customer Sales Time Product

24 Purging and Archiving Data As data ages, its value depreciates. Remove old data from the warehouse: - Archive for later use - Purge without copy

25 Techniques for Purging Data TRUNCATE: Retains no rollback DELETE: Retains redo and rollback ALTER TABLE: Removes a partition PL/SQL: Uses database triggers

26 Techniques for Archiving Data Export to dump file from tables Import to tables from dump file ALTER TABLE EXCHANGE partitions Database EXP IMP.dmp

27 Verdict Defined by business requirements Must be managed

28 Final Tasks Update metadata - ETT - User Publish data - Availability - Changes - Subject area basis Use database roles to prevent and allow access

29 Publishing Data Control access using database roles 24-hour operation may be requested Compromise between load and access Consider - Staggering updates - Using temporary tables - Using separate tables

30 ETT Tool Selection Criteria Overlap with existing tools Availability of meta model Supported data sources Ease of modification and maintenance Required fine tuning of code Ease of change control Power of transformation logic Level of modularization Power of error, exception, resubmission features Intuitive documentation Performance of code

31 ETT Tool Selection Criteria Activity scheduling and sophistication Metadata generation Learning curve Flexibility Supported operation systems Cost

32 Transportation Tools Information OpenBridge Oracle SQL*Loader Gateways PL/SQL Precompilers Platinum Technology InfoPump Platinum Info Transport

33 Replication Server Utilities Oracle Symmetric and Heterogeneous Replication

34 Gateways and Middleware Brio Technology DataPrism Information Co. OpenBridge Information Builders EDA/SQL Oracle Gateways Platinum Technology InfoHub Prism Prism Manager Software AG Entire Transaction Propagator

35 Summary This lesson discussed the following topics: Capturing changed data Applying the changes Purging and archiving data Publishing the data, controlling access, and automating processes Identifying tools for transporting data into the warehouse


Download ppt "Transportation: Refreshing Warehouse Data Chapter 13."

Similar presentations


Ads by Google