Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transportation: Loading Warehouse Data Chapter 12.

Similar presentations


Presentation on theme: "Transportation: Loading Warehouse Data Chapter 12."— Presentation transcript:

1 Transportation: Loading Warehouse Data Chapter 12

2 Transporting data into the Warehouse Loading moves the data into the warehouse Loading can be time-consuming: - Consider the load window. - Schedule the task; automates all processes. Initial load moves large volumes Subsequent refresh moves smaller volumes Business determines the cycle

3 Extract Processing Environment After each time interval, build a new database Run queries Operational database T1T2T3

4 Warehouse Processing Environment Build a new database After each time interval, add changes to database Archive or pure oldest data Run queries Operational database T1T2T3

5 First-Time Load Single event that populates the data with historical data Involves large volume of data Employs distinct ETT tasks Involves large amounts of processing after load Operational database T1T2T3

6 Refresh Performed according to a business cycle Simple task Less data to load than first-time load Less-complex ETT Smaller amounts of postload processing Operational database T1T2T3

7 Building the Transportation Process Specification Techniques and tools File transfer methods The load window Time window for other tasks First-time and refresh cycle Connectivity bandwidth

8 Building the Transportation Process Test the proposed techniques Document proposed load Gain agreement on the process Monitor Review Revise

9 Granularity Important design and operational issue Space requirements - Storage - Backup - Recovery - Load Low-level grain - Expensive, high level of processing, more disk, detail High-level grain - Cheaper, less processing, less disk, little detail

10 Transportation Techniques Tools Utilities and 3GL Gateways Customized copy programs Replication FTP Manual

11 Transportation Technique Considerations Tools are comprehensive but costly. Data-movement utilities are fast and powerful. Gateways are not always the fastest method: - Access other databases - Supply dependent data marts - Support a distributed environment - Provide real-time access if needed

12 SQL*Loader Input files Log files Control file Bad files Discard files Using SQL*Loader to Load Data Fastest load mechanism Direct path Parallel and unrecoverable Direct-load INSERT (Oracle8) Direct-path load API (Oracle8i)

13 Direct-Path Load API in Oracle8i Allows ETT and other tools to load Oracle databases efficiently Permits load behavior to be customized Gives direct-path load performance Provided complete access to all direct- load functionality using OCI

14 More Transportation Technique Considerations Use customized programs as a last resort Replication is limited by data-transfer rates

15 Postprocessing of Loaded Data Create Indexes Generate keys SummarizeFilter Postprocessing of loaded data Extract Transform Transport

16 Indexing Data Before load: fast index reenablement During load: adds time to load window After load: adds time to load window Index Operational databases Staging files Warehouse database

17 Unique Indexes Disable constraints load Enable constraints to create index Disable constraints Load data Enable constraints Create index Catch errors Reprocess

18 Creating Artificial Keys Use generalized or derived keys Maintain the uniqueness of a row Use an administrative process to assign the key Concatenate operational key with number: - Easy to maintain - Cumbersome keys - No clean value for retrieval

19 Creating Unique Keys for Records Assign a number from a list: - No semantic meaning - Extract operations must reference table assign numbers Update metadata Verdict 1099081

20 Creating Summary Tables CTAS pCTAS Summary data Warehouse Data marts

21 Verifying Data Integrity Load data into intermediate file Compare target flash totals with totals before load File 1 File 2 Counts and amounts Flash totals File 1 Load File 2 Preserve, inspect, fix, then load Load Intermediate file Warehouse

22 Steps for Verifying Data Integrity

23 Standard Quality Assurance Checks Load status Completion of the process Completeness of the data Data reconciliation Violations Reprocessing Comparison of counts and amounts

24 Summary This lesson discussed the following topics: First-time load considerations Techniques for transporting data Tasks involved in the postload processing stage


Download ppt "Transportation: Loading Warehouse Data Chapter 12."

Similar presentations


Ads by Google