Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Integration Combining data from different sources, providing a unified view of the data Combining data from different sources, providing a unified.

Similar presentations


Presentation on theme: "Data Integration Combining data from different sources, providing a unified view of the data Combining data from different sources, providing a unified."— Presentation transcript:

1 Data Integration Combining data from different sources, providing a unified view of the data Combining data from different sources, providing a unified view of the data Data warehouse is a repository that results from some types of data integration processes Data warehouse is a repository that results from some types of data integration processes 1

2 Techniques for Data Integration Consolidation (ETL) Consolidation (ETL) Extract/Transform/Load Extract/Transform/Load Consolidating all data into a centralized database (like a data warehouse) Consolidating all data into a centralized database (like a data warehouse) Data federation (EII) Data federation (EII) Enterprise Information Integration Enterprise Information Integration Provides a virtual view of data without actually creating one centralized database Provides a virtual view of data without actually creating one centralized database Data propagation (EAI) Data propagation (EAI) Enterprise Application Integrations Enterprise Application Integrations Duplicate data across databases, with near real-time delay Duplicate data across databases, with near real-time delay 2

3 3 The ETL Process Capture/Extract Capture/Extract Scrub or data cleansing Scrub or data cleansing Transform Transform Load and Index Load and Index ETL = Extract, transform, and load

4 4 Static extract Static extract = capturing a snapshot of the source data at a point in time Incremental extract Incremental extract = capturing changes that have occurred since the last static extract Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse

5 5 Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality Fixing errors: Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also: Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data

6 6 Transform = convert data from format of operational system to format of data warehouse Record-level: Selection–data partitioning Joining–data combining Aggregation–data summarization Field-level: single-field–from one field to one field multi-field–from many fields to one, or one field to many

7 7 Load/Index= place transformed data into the warehouse and create indexes Refresh mode: Refresh mode: bulk rewriting of target data at periodic intervals Update mode: Update mode: only changes in source data are written to data warehouse

8 Data Transformation Functions Record-level Record-level Transformation that involves obtaining the set of records you want from the data source Transformation that involves obtaining the set of records you want from the data source Selection, joining, aggregation Selection, joining, aggregation Field-level Field-level Transformation that converts data from fields of a source record to field(s) of a target record. Transformation that converts data from fields of a source record to field(s) of a target record. Single-field vs. Multi-field transformations Single-field vs. Multi-field transformations 8

9 9 Single-field transformation In general–some transformation function translates data from old form to new form Algorithmic transformation uses a formula or logical expression Table lookup–another approach, uses a separate table keyed by source record code

10 10 Multifield transformation M:1–from many source fields to one target field 1:M–from one source field to many target fields


Download ppt "Data Integration Combining data from different sources, providing a unified view of the data Combining data from different sources, providing a unified."

Similar presentations


Ads by Google