Presentation is loading. Please wait.

Presentation is loading. Please wait.

Agenda 02/20/2014 Complete data warehouse design exercise Finish reconciled data warehouse, bus matrix and data mart Display each group’s work Discuss.

Similar presentations


Presentation on theme: "Agenda 02/20/2014 Complete data warehouse design exercise Finish reconciled data warehouse, bus matrix and data mart Display each group’s work Discuss."— Presentation transcript:

1 Agenda 02/20/2014 Complete data warehouse design exercise Finish reconciled data warehouse, bus matrix and data mart Display each group’s work Discuss issues in data warehouse design Define ETL in more depth by the activities performed.

2 Discussed in prior classes... Lots of data. Traditional transaction processing systems Non-traditional transaction processing Call center; Click-stream; Loyalty card; Warranty cards/product registration information External data from government and commercial entities. Lots of poor quality data for lots of reasons that can be traced back to lots of people.

3 Populating the data warehouse Extract Take data from source systems. May require middleware to gather all necessary data. Transformation Put data into consistent format and content. Validate data – check for accuracy, consistency using pre-defined and agreed-upon business rules. Convert data as necessary. Load Use a batch (bulk) update operation that keeps track of what is loaded, where, when and how. Keep a detailed load log to audit updates to the data warehouse.

4 Data Cleansing Source systems contain “dirty data” that must be cleansed ETL software contains rudimentary to very sophisticated data cleansing capabilities Industry-specific data cleansing software is often used. Important for performing name and address correction Leading data cleansing vendors include general hardware/software vendors such as IBM, Oracle, SAP, Microsoft and specialty vendors Informatica, Information Builders (DataMigrator), Harte-Hanks (Trillium), CloverETL, Talend, and BusinessObjects (SAP-AG)

5 Steps in data cleansing  Parsing  Correcting  Standardizing  Matching  Consolidating

6 Parsing Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files. Examples include parsing the first, middle, and last name; street number and street name; and city and state.

7 Parsing

8 Correcting Corrects parsed individual data components using sophisticated data algorithms and secondary data sources.

9 Correcting

10 Standardizing Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules.

11 Standardizing

12 Matching Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications.

13 Matching

14 Consolidating Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.

15 Consolidating

16 Source system view – 3 clients Policy No. ME309451-2 Account# 1238891 Transaction B498/97

17 The reality – ONE client Account# 1238891 Policy No. ME309451-2 Transaction B498/97

18 Consolidating whole groups WilliamLewisBethParker KarenParker-LewisWilliam Parker-Lewis, Jr.

19 ETL Products SQL Server 2012 Integration Services from Microsoft Power Mart/Power Center/Power Exchange from Informatica Warehouse Builder from Oracle Teradata Warehouse Builder from Teradata DataMigrator from Information Builders SAS System from SAS Institute Connectivity Solutions from OpenText Ab Initio

20 ETL Goal: Data is complete, accurate, consistent, and in conformance with the business rules of the organization. Questions: Is ETL really necessary? Has the advent of big data changed our need for ETL? ETL vs. ELT Does the use of Hadoop eliminate the need for ETL software??? Does it matter if the data is stored in the “cloud.”


Download ppt "Agenda 02/20/2014 Complete data warehouse design exercise Finish reconciled data warehouse, bus matrix and data mart Display each group’s work Discuss."

Similar presentations


Ads by Google