Agenda 03/27/2014 Review first test. Discuss internal data project. Review characteristics of data quality. Types of data. Data quality. Data governance.

Slides:



Advertisements
Similar presentations
Geographic Information Systems
Advertisements

Chapter 13 The Data Warehouse
C6 Databases.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Accounting System Design
Data - Information - Knowledge
Accounting Databases Chapter 2 The Crossroads of Accounting & IT
Chapter 3 Foundations of Business Intelligence: Databases and Information Management.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
ETL The process of updating the data warehouse.. Recent Developments in Data Warehousing: A Tutorial Hugh J. Watson Terry College of Business University.
ETL By Dr. Gabriel.
Agenda 02/20/2014 Complete data warehouse design exercise Finish reconciled data warehouse, bus matrix and data mart Display each group’s work Discuss.
Data Warehouse Tools and Technologies - ETL
Agenda 02/21/2013 Discuss exercise Answer questions in task #1 Put up your sample databases for tasks #2 and #3 Define ETL in more depth by the activities.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
MIS2502: Data Analytics Extract, Transform, Load
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
 Introduction Introduction  Purpose of Database SystemsPurpose of Database Systems  Levels of Abstraction Levels of Abstraction  Instances and Schemas.
Data Warehouse Chapter 11. Multiple Files Problem Added complexity of multiple source files Start simple Multiple Source files Extracted data Logic to.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Jean-Pierre Dijcks Principal Product Manager Oracle Warehouse Builder Oracle Corporation.
Functions of a Database Management System
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
Data Profiling
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 1 Databases and Information Systems.
More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from.
Business Intelligence Zamaneh Jahed. What is Business Intelligence? Business Intelligence (BI) is a broad category of applications and technologies for.
GETTING THE DATA INTO THE WAREHOUSE: EXTRACT, TRANSFORM, LOAD MIS2502 Data Analytics.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
For the Chicago Chapter BOUG Meeting – August 20, 2010
Foundations of Business Intelligence: Databases and Information Management.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
7 Strategies for Extracting, Transforming, and Loading.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Welcome November 2012 Einführung in.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Keeping your Data Repository in Top Health By: Ian Proffer MUSE Session 359.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
11 Database The ultimate in data organization. 2 Database Management Systems (DBMS)  Application software designed to capture and analyze data  Four.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Data Integration - The ETL Process Module 4: BIC#4 – Data Integration Capability Populating Data Warehouse (Data Mart) 1.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
ETL Design - Stage Philip Noakes May 9, 2015.
Intro to BI Architecture| Warren Sifre
Overview of MDM Site Hub
Implementing MDM for BI & Data Integration by Kabir Makhija
MIS5101: Extract, Transform, Load (ETL)
Database Systems – Data Hygiene
Database Management System (DBMS)
MIS5101: Extract, Transform, Load (ETL)
MIS2502: Data Analytics Extract, Transform, Load
MIS5101: Extract, Transform, Load (ETL)
Unidad II Data Warehousing Interview Questions
MIS2502: Data Analytics Extract, Transform, Load
MIS2502: Data Analytics Extract, Transform, Load
Metadata The metadata contains
Data Warehousing Concepts
Presentation transcript:

Agenda 03/27/2014 Review first test. Discuss internal data project. Review characteristics of data quality. Types of data. Data quality. Data governance. Define ETL activities. Discuss database analyst/programmer responsibilities for data evaluation.

QuestionAnswerQuestionAnswerQuestionAnswerQuestionAnswer 1. B8. D15. A22. B 2. A9. C16. D23. D 3. C10. D17. D24. D 4. B11. B18. A25. A 5. A12. C19. C 6. C13. B20. A 7. E14. A21. C Answers to Multiple Choice Questions

Discussed in prior classes... Lots of data. Traditional transaction processing systems Non-traditional data Call center; Click-stream; Loyalty card; Warranty cards/product registration information, , twitter, Facebook External data from government and commercial entities General classification of data Transaction data Referential data/master data Metadata

Data quality What is good quality data? Correct Accurate Consistent Complete Available Accessible Timely Relevant

How does data “go bad”? Does all “bad” data have to be fixed?

Data governance Policies, processes and procedures aimed at managing the data in an organization. Usually high-level cross-department committees that oversee data management across the organization. Responsible for defining what data is necessary to gather. Responsible for defining the source and store of data. Responsible for security policies, processes, procedures. Responsible for creating the policies, processes and procedures. Responsible for assigning blame. Responsible for enforcing policies.

Data quality in data warehouses Is it more important than data quality in source transaction and reference data? How is better quality data achieved? Automated ETL processes to populate the data warehouse Spot checking programmatically

Populating the data warehouse Extract Take data from source systems. May require middleware to gather all necessary data. Transformation Put data into consistent format and content. Validate data – check for accuracy, consistency using pre-defined and agreed-upon business rules. Convert data as necessary. Load Use a batch (bulk) update operation that keeps track of what is loaded, where, when and how. Keep a detailed load log to audit updates to the data warehouse.

Data Cleansing Source systems contain “dirty data” that must be cleansed ETL software contains rudimentary to very sophisticated data cleansing capabilities Industry-specific data cleansing software is often used. Important for performing name and address correction Leading data cleansing vendors include general hardware/software vendors such as IBM, Oracle, SAP, Microsoft and specialty vendors Informatica, Information Builders (DataMigrator), Harte-Hanks (Trillium), CloverETL, Talend, and BusinessObjects (SAP-AG)

Steps in data cleansing  Parsing  Correcting  Standardizing  Matching  Consolidating

Parsing Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files. Examples include parsing the first, middle, and last name; street number and street name; and city and state.

Parsing

Correcting Corrects parsed individual data components using sophisticated data algorithms and secondary data sources.

Correcting

Standardizing Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules.

Standardizing

Matching Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications.

Matching

Consolidating Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.

Consolidating

Source system view – 3 clients Policy No. ME Account# Transaction B498/97

The reality – ONE client Account# Policy No. ME Transaction B498/97

Consolidating whole groups WilliamLewisBethParker KarenParker-LewisWilliam Parker-Lewis, Jr.

ETL Products SQL Server 2012 Integration Services from Microsoft Power Mart/Power Center/Power Exchange from Informatica Warehouse Builder from Oracle Teradata Warehouse Builder from Teradata DataMigrator from Information Builders SAS System from SAS Institute Connectivity Solutions from OpenText Ab Initio

ETL Goal: Data is complete, accurate, consistent, and in conformance with the business rules of the organization. Questions: Is ETL really necessary? Has the advent of big data changed our need for ETL? ETL vs. ELT Does the use of Hadoop eliminate the need for ETL software??? Does it matter if the data is stored in the “cloud”?