Download presentation
Presentation is loading. Please wait.
Published byGloria Lowry Modified over 9 years ago
1
Building the Warehouse Chapter 10
2
Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining Metadata) Meeting a Business Need Choosing a Computing Architecture Modeling The Data Warehouse Analyzing User Query Needs Planning Warehouse Storage ETT (Building The Warehouse) ETT (Building The Warehouse) Supporting End User Access Managing The Data Warehouse
3
Extraction/Transformation/Tra nsportation Process (ETT) * Extract source data * Load data into WH * Transform/clean data * Detect change * Index and summarize * Refresh data Programs Gateways Tools ETT Operational systems Warehouse
4
ETT Processes zMust result in data that is relevant, useful, high-quality, accurate, and accessible zRequire a large proportion of warehouse development time and resources Clean up Consolidate Restructure Relevant Useful Quality Accurate Accessible Opertational Systems ETT Warehouse
5
Data Staging Area zThe Construction site for the warehouse zRequired by most implementations zComposed of ODS, flat files, or relational server tables zFrequently configured as multitier staging Operational system Data Staging area Warehouse Extract Transport (Load)
6
Remote Staging Model Data staging area within the warehouse environment Operational system Oper.envt. Data Staging area Warehouse Operational system Data Staging area Warehouse Oper.envt. Staging envt. Warehouse envt. Warehouse environment Data staging area in its own environment, avoiding negative impact on the warehouse environment Extract, Transform, transport Transport (Local)
7
Onsite Staging Model Data staging area within the operational environment, possibly affecting the operational system Operational system Data staging area Warehouse WH envt. Operational environment Transform Extract
8
Extracting Data zRoutines developed to select fields from source zVarious data formats zRules, audit trails, error correction facilities Operational databases Warehouse database Data Staging area Transform Data mapping
9
Source Systems zProduction zArchive zInternal zExternal
10
Production Data zOperating system platforms zHardware platforms zFile systems zDatabase systems and vertical applications IMS DB2 VSAM NonStop SQL Oracle Sybase Rdb SAP Shared Medical Systems Dun and Bradstreet Financials Hogan Financials Oracle Financials
11
Archive Data zHistorical data zUseful for analysis over long periods of time zUseful for first-time load zMay require unique transformations Operational database Warehouse database
12
Internal Data zPlanning, sales, and marketing organization data zMaintained by: - Spreadsheets (structured) - Documents (unstructured) zTreated like any other source data Planning Marketing Accounting Warehouse database
13
External Data zInformation from outside the organization zIssues of frequency, format, and predictability zDescribed and tracked using metadata A.C.Nielsen, IRI, IMS, Waish America Competitive information Economic forecasts Wall Street Journal Warehousing databases Barron’s Dun and Bradstreet Purchased databases
14
Mapping zDefines which operational attributes to use zDefines how to transform the attributes for the warehouse zDefines where the attributes exist in the warehouse zMapping tools are available Metadata File A Staging File One F1 Number F2 Name F3 DOB File A F1 123 F2 Bloggs F3 10/12/56 Staging File One Number USA123 Name Mr.Bloggs DOB 10-Dec-56
15
Extraction Techniques zPrograms: C, COBOL, PL/SQL zGateways: transparent database access zIn-house development is popular zTools - High initial cost - Ongoing automation - Data cleanup
16
Sources and Targets Data marts Data analysis Data mining OLAP
17
Designing Extraction Processes zAnalysis: - Source, technologies - Data types, quality, owners zDesign options: - Manual, custom, gateway, third-party - Replication, full, or delta refresh zDesign issues: - Batch window, volumes, data currency - Automation, skills needed, resources
18
Maintaining Extraction Metadata zSource location, type, structure zAccess method zPrivilege information zTemporary storage zFailure procedures zValidity checks zHandlers for missing data
19
Possible ETT Failure zA missing source file zA system failure zPoor mapping information zInadequate storage planning zA source structural change zNo contingency plan zInadequate data validation
20
Maintaining ETT Quality zETT must be: - Tested - Documented - Monitored and reviewed zDisplay metadata must be coordinated
21
Selection Criteria zBase functionality zInterface features zMetadata repository zOpen API zMetadata access zRepository utilities zInput and output processing zCleansing, reformatting, and auditing zReference zTraining requirements
22
WTI Partner ETT Tools zCarleton zConstellar zEvolutionary Technologies zInformatica zInformation Builders zOracle EDMS, Toolkits, OADW zPrism Solutions zSagent zVality Technology
23
Summary This lesson discussed the following topics: zETT processes are essential and consume a large proportion of warehouse resources and time zThe extraction process acquires source data zYou may encounter many data sources zThere are many data extraction issues zETT Tools should be considered
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.