Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright  Oracle Corporation, 1999. All rights reserved. 1010 Building the Warehouse.

Similar presentations


Presentation on theme: "Copyright  Oracle Corporation, 1999. All rights reserved. 1010 Building the Warehouse."— Presentation transcript:

1 Copyright  Oracle Corporation, 1999. All rights reserved. 1010 Building the Warehouse

2 10-2 Copyright  Oracle Corporation, 1999. All rights reserved. Overview Project Management (Methodology, Maintaining Metadata) Defining DW Concepts & Terminology Planning for a Successful Warehouse Analyzing User Query Needs Choosing a Computing Architecture Modeling the Data Warehouse Planning Warehouse Storage ETT (Building the Warehouse) Meeting a Business Need Supporting End User Access Managing the Data Warehouse

3 10-3 Copyright  Oracle Corporation, 1999. All rights reserved. Objectives After completing this lesson, you should be able to do the following: Outline the extraction, transformation, and transportation processes for building a data warehouse Identify extraction issues Explain how to examine data sources Identify extraction techniques List tools that can be used to extract data from sources After completing this lesson, you should be able to do the following: Outline the extraction, transformation, and transportation processes for building a data warehouse Identify extraction issues Explain how to examine data sources Identify extraction techniques List tools that can be used to extract data from sources

4 10-4 Copyright  Oracle Corporation, 1999. All rights reserved. Extraction/Transformation/Transportation Processes (ETT) Extract source data Transform/clean data Index and summarize Extract source data Transform/clean data Index and summarize Load data into WH Detect changes Refresh data Load data into WH Detect changes Refresh data Programs Tools ETT Operational systems Warehouse Browser: http:// Hollywood X + Customers: a recorof as X + Customers: Browser: http:// Hollywood Browser: http:// Hollywood X + Gateways

5 10-5 Copyright  Oracle Corporation, 1999. All rights reserved. ETT Processes Must result in data that is relevant, useful, high- quality, accurate, and accessible Require a large proportion of warehouse development time and resources Must result in data that is relevant, useful, high- quality, accurate, and accessible Require a large proportion of warehouse development time and resources Warehouse Operational systems Relevant Clean up Consolidate Restructure ETT Useful Quality Accurate Accessible

6 10-6 Copyright  Oracle Corporation, 1999. All rights reserved. Data Staging Area The construction site for the warehouse Required by most implementations Composed of ODS, flat files, or relational server tables Frequently configured as multitier staging The construction site for the warehouse Required by most implementations Composed of ODS, flat files, or relational server tables Frequently configured as multitier staging Extract Transform Operational system Transport (Load) Warehouse Data staging area

7 10-7 Copyright  Oracle Corporation, 1999. All rights reserved. Remote Staging Model Data staging area within the warehouse environment Extract, transform, transport Transform Operational system Transport (Load) Data staging area Warehouse Warehouse environment Oper. envt. Data staging area in its own environment, avoiding negative impact on the warehouse environment Extract, transform, transport Transform Operational system Transport (Load) Data staging area Warehouse Staging envt. Oper. envt. Warehouse envt.

8 10-8 Copyright  Oracle Corporation, 1999. All rights reserved. Onsite Staging Model Extract Transform Operational system Transport (Load) Data staging area Warehouse Operational environment WH envt. Data staging area within the operational environment, possibly affecting the operational system Data staging area within the operational environment, possibly affecting the operational system

9 10-9 Copyright  Oracle Corporation, 1999. All rights reserved. Extracting Data Routines developed to select fields from source Various data formats Rules, audit trails, error correction facilities Routines developed to select fields from source Various data formats Rules, audit trails, error correction facilities Transform Operationaldatabases Data staging area Warehousedatabase Browser: http:// Hollywood X + Customers: a recorof as X + Customers: Browser: http:// Hollywood Browser: http:// Hollywood X + Data mapping

10 10-10 Copyright  Oracle Corporation, 1999. All rights reserved. Source Systems Production Archive Internal External Production Archive Internal External Browser: http:// Hollywood X + Customers: Browser: http:// Hollywood X + a recorof as X + Customers: Browser: http:// Hollywood 12345.00 12780.00 2345787.00 87877.98 5678.00 100% 110% 230% 200% -10% ABC CO GMBH LTD GBUK INC FFR ASSOC MCD CO

11 10-11 Copyright  Oracle Corporation, 1999. All rights reserved. Operating system platforms Hardware platforms File systems Database systems and vertical applications Operating system platforms Hardware platforms File systems Database systems and vertical applications Production Data IMSDB2VSAM NonStop SQL OracleSybaseRdbSAP Shared Medical Systems Dun and Bradstreet Financials Hogan Financials Oracle Financials Browser: http:// Hollywood X + Customers: a recorof as X + Customers: Browser: http:// Hollywood Browser: http:// Hollywood X +

12 10-12 Copyright  Oracle Corporation, 1999. All rights reserved. Historical data Useful for analysis over long periods of time Useful for first-time load May require unique transformations Historical data Useful for analysis over long periods of time Useful for first-time load May require unique transformations Archive Data Operationaldatabases Warehousedatabase

13 10-13 Copyright  Oracle Corporation, 1999. All rights reserved. Internal Data Planning, sales, and marketing organization data Maintained by: –Spreadsheets (structured) –Documents (unstructured) Treated like any other source data Planning, sales, and marketing organization data Maintained by: –Spreadsheets (structured) –Documents (unstructured) Treated like any other source data PlanningMarketingAccounting 12345.00 12780.00 2345787.00 87877.98 5678.00 100% 110% 230% 200% -10% ABC CO GMBH LTD GBUK INC FFR ASSOC MCD CO Warehousedatabase 12345.00 12780.00 2345787.00 87877.98 5678.00 100% 110% 230% 200% -10% ABC CO GMBH LTD GBUK INC FFR ASSOC MCD CO 12345.00 12780.00 2345787.00 87877.98 5678.00 100% 110% 230% 200% -10% ABC CO GMBH LTD GBUK INC FFR ASSOC MCD CO

14 10-14 Copyright  Oracle Corporation, 1999. All rights reserved. Information from outside the organization Issues of frequency, format, and predictability Described and tracked using metadata Information from outside the organization Issues of frequency, format, and predictability Described and tracked using metadata External Data Barron's Dun and Bradstreet Purchaseddatabases Wall Street Journal Economicforecasts Competitiveinformation Warehousingdatabases A.C. Nielsen, IRI, IMS, Walsh America

15 10-15 Copyright  Oracle Corporation, 1999. All rights reserved. Mapping Defines which operational attributes to use Defines how to transform the attributes for the warehouse Defines where the attributes exist in the warehouse Mapping tools are available Defines which operational attributes to use Defines how to transform the attributes for the warehouse Defines where the attributes exist in the warehouse Mapping tools are available File A F1123 F2Bloggs F310/12/56 Staging File One NumberUSA123 NameMr. Bloggs DOB10-Dec-56 Metadata File AStaging File One F1Number F2Name F3DOB

16 10-16 Copyright  Oracle Corporation, 1999. All rights reserved. Programs: C, COBOL, PL/SQL Gateways: transparent database access In-house development is popular Tools –High initial cost –Ongoing automation –Data cleanup Programs: C, COBOL, PL/SQL Gateways: transparent database access In-house development is popular Tools –High initial cost –Ongoing automation –Data cleanup Extraction Techniques

17 10-17 Copyright  Oracle Corporation, 1999. All rights reserved. Sources and Targets OLAP Data marts Data analysis Data mining SourcesODSWarehouseAccess

18 10-18 Copyright  Oracle Corporation, 1999. All rights reserved. Designing Extraction Processes Analysis: –Sources, technologies –Data types, quality, owners Design options: –Manual, custom, gateway, third-party –Replication, full, or delta refresh Design issues: –Batch window, volumes, data currency –Automation, skills needed, resources Analysis: –Sources, technologies –Data types, quality, owners Design options: –Manual, custom, gateway, third-party –Replication, full, or delta refresh Design issues: –Batch window, volumes, data currency –Automation, skills needed, resources

19 10-19 Copyright  Oracle Corporation, 1999. All rights reserved. Maintaining Extraction Metadata Source location, type, structure Access method Privilege information Temporary storage Failure procedures Validity checks Handlers for missing data Source location, type, structure Access method Privilege information Temporary storage Failure procedures Validity checks Handlers for missing data

20 10-20 Copyright  Oracle Corporation, 1999. All rights reserved. Possible ETT Failures A missing source file A system failure Inadequate metadata Poor mapping information Inadequate storage planning A source structural change No contingency plan Inadequate data validation A missing source file A system failure Inadequate metadata Poor mapping information Inadequate storage planning A source structural change No contingency plan Inadequate data validation

21 10-21 Copyright  Oracle Corporation, 1999. All rights reserved. Maintaining ETT Quality ETT must be: –Tested –Documented –Monitored and reviewed Disparate metadata must be coordinated ETT must be: –Tested –Documented –Monitored and reviewed Disparate metadata must be coordinated

22 10-22 Copyright  Oracle Corporation, 1999. All rights reserved. Extraction Tools Mapping information Update metadata JCL files Map Source Data to Intermediate File Store Sales and Marketing Customer Name Char Varchar 20 Unique name

23 10-23 Copyright  Oracle Corporation, 1999. All rights reserved. Base functionality Base functionality Interface features Interface features Metadata repository Metadata repository Open API Open API Metadata access Metadata access Repository utilities Repository utilities Input and output processing Input and output processing Cleansing, reformatting, and auditing Cleansing, reformatting, and auditing References References Training requirements Training requirements Base functionality Base functionality Interface features Interface features Metadata repository Metadata repository Open API Open API Metadata access Metadata access Repository utilities Repository utilities Input and output processing Input and output processing Cleansing, reformatting, and auditing Cleansing, reformatting, and auditing References References Training requirements Training requirements Selection Criteria

24 10-24 Copyright  Oracle Corporation, 1999. All rights reserved. WTI Partner ETT Tools Carleton Constellar Evolutionary Technologies Informatica Information Builders OracleEDMS, Toolkits, OADW Prism Solutions Sagent Vality Technology Carleton Constellar Evolutionary Technologies Informatica Information Builders OracleEDMS, Toolkits, OADW Prism Solutions Sagent Vality Technology

25 10-25 Copyright  Oracle Corporation, 1999. All rights reserved. Summary This lesson discussed the following topics: ETT processes are essential and consume a large proportion of warehouse resources and time The extraction process acquires source data You may encounter many data sources There are many data extraction issues ETT Tools should be considered This lesson discussed the following topics: ETT processes are essential and consume a large proportion of warehouse resources and time The extraction process acquires source data You may encounter many data sources There are many data extraction issues ETT Tools should be considered

26 10-26 Copyright  Oracle Corporation, 1999. All rights reserved. Practice 10-1 Overview This practice covers the following topics: Answering a series of short questions Specifying true or false to a series of statements This practice covers the following topics: Answering a series of short questions Specifying true or false to a series of statements


Download ppt "Copyright  Oracle Corporation, 1999. All rights reserved. 1010 Building the Warehouse."

Similar presentations


Ads by Google