1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693.

Slides:



Advertisements
Similar presentations
© 2009 IBM Corporation Data Warehouse Solutions on System z - Doing more with what you have! - Doing more with what you have! Beth Hamel Product Manager.
Advertisements

Supervisor : Prof . Abbdolahzadeh
Cloud Business Intelligence Vendor Research Supervisor - Gary Lau Presented by Dujin Choi.
BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
Data Extraction, Cleanup & Transformation Tools
James Serra – Data Warehouse/BI/MDM Architect
Antonio Elinon Caratrel Consultants Pty Ltd. Agenda Enterprise Architecture (EA) to Business Intelligence (BI) to Accounting Intelligence (AI) Accounting.
Asuri Saranathan. Agenda  Introduction  Best Practices – Over View  Deep Dive  Conclusion  Q & A.
SAS® Data Integration Solution
Workload Management BMO Financial Group Case Study IRMAC, January 2008 Sorina Faur, Database Development Manager.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Prepared by Stephen A. Brobst (617) Copyright © 2000, Stephen A. Brobst. Do not duplicate or distribute without written.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Introduction to Building a BI Solution 권오주 OLAPForum
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Data Management Capabilities and Past Performance Dr. Srinivas Kankanahalli.
Copyright © 2006, SAS Institute Inc. All rights reserved. Data at its Best How to keep large data volumes in order and ensure high quality ? Milen Georgiev.
 Workflow  ETL workflow  Complex event processing(CEP) Mona Alnahari.
BUSINESS INTELLIGENCE/DATA INTEGRATION/ETL/INTEGRATION AN INTRODUCTION Presented by: Gautam Sinha.
Data Warehousing Introduction. Text and Resources The Data Warehouse Lifecycle Toolkit, Kimball, Reeves, Ross, and Thornthwaite Internet resources Data.
Data Warehouse Tools and Technologies - ETL
IT – DBMS Concepts Relational Database Theory.
Gain Performance & Scalability With RightNow Analytics
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
Understanding Data Warehousing
1 Brett Hanes 30 March 2007 Data Warehousing & Business Intelligence 30 March 2007 Brett Hanes.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
A Hybrid Row-column OLTP Database Architecture for Operational Reporting Jan Schaffner, Anja Bog, Jens Krüger, Alexander Zeier.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
ETL Overview February 24, DS User Group - ETL - February ETL Overview “ETL is the heart and soul of business intelligence (BI).” -- TDWI ETL.
PowerMart of Informatica 발표자 : 김수경 (992COG05) 발표일 : March 27 th, 2000.
More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Slide 1. © 2012 Invensys. All Rights Reserved. The names, logos, and taglines identifying the products and services of Invensys are proprietary marks.
Right In Time Presented By: Maria Baron Written By: Rajesh Gadodia
Fall CIS 764 Database Systems Design L18.3 Business Intelligence Aspects (aka Decision support systems) (Slides support.
ETL Extract. Design Logical before Physical Have a plan Identify Data source candidates Analyze source systems with data- profiling tools Receive walk-through.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
Datawarehouse A sneak preview. 2 Data Warehouse Approach An old idea with a new interest: Cheap Computing Power Special Purpose Hardware New Data Structures.
CISB594 – Business Intelligence Data Warehousing Part I.
Operational Data Store
© 2009 Wipro Ltd - Confidential ETL TESTING Handling Heterogeneous Data Formats Rajasimman Selvaraj Simanchal Sahu Tithi Mukherjee.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Platinum DecisionBase1 DW Product Platinum - Computer AssociatesDecisionBase Hyunsook Lim Database Laboratory Dept. of CSE.
SPECTO TRAINING contact us: , mail :
Chapter 8: Data Warehousing. Data Warehouse Defined A physical repository where relational data are specially organized to provide enterprise- wide, cleansed.
Data Warehousing The Easy Way with AWS Redshift
1 Copyright © 2007, Oracle. All rights reserved. Installing and Setting Up the Warehouse Builder Environment.
By: Haytham Abdel-Qader. Topics in Data Management include: I. Data analysis II. Database management system III. Data modeling IV. Database administration.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Supervisor : Prof . Abbdolahzadeh
Data Management Capabilities and Past Performance
SAS® Data Integration Solution
Data Platform and Analytics Foundational Training
Business Intelligence & Data Warehousing
with the Microsoft BI Ecosystem
QlikView Connector for Informatica Powercenter An Introduction
Introduction.
Delivering Business Insight with SQL Server 2005
PowerMart of Informatica
المحاضرة 4 : مستودعات البيانات (Data warehouse)
SAS® Data Integration Solution
Delivering an End-to-End Business Intelligence Solution
Informatica Powercenter 8.1
Data Warehousing Concepts
Analytics, BI & Data Integration
SSIS. FIRST EXPERIENCE. By Virginia Mushkatblat
Resources.
Presentation transcript:

1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

2 Agenda  Introduction  Extract, Transform and Load (ETL) Approach  Strengths/ Weaknesses of ETL  Extract, Load and Transform (E-LT) Approach  Strengths/ Weaknesses of E-LT  Experiments  Results  Conclusion/ Future Work  References  Q & A

3 Introduction  All Business Intelligence applications are data-centric.  Large volumes of data is stored and processed along with its history in the Data Warehouse.  The Data is extracted from various heterogeneous source systems and transformed as per the business requirements.  Data is used for analytical purposes- future forecasting, profitability analysis, trend analysis etc. to drive the business.

4 Extract, Transform and Load (ETL)  Traditional approach of loading data into data warehouses.  Data is first pulled/ pushed from various heterogeneous sources like ERP, CRM, RDBMS, Flat Files.  Business rules are applied on the data in the Staging Area.  Transformed data is loaded into target database.  Often designed backward, thus only the relevant data is fetched.

5 Strengths/ Weaknesses of ETL Strengths  Can perform complex operations in single data flow diagrams.  Mostly designed backward, thus only relevant data is loaded.  Used for building real-time data warehouses.  Robust tools are available like Informatica, Ab Initio, Data Stage. Weaknesses  Data transformation step of ELT is performed by ETL engine, therefore increasing processing time.  Data is moved over the network twice.  Since it is developed backward, more effort for future redesign.

6 Extract, Load and Transform (E-LT)  Newer Approach of loading into target data warehouse.  The Data is extracted from various sources same way it is done via ETL.  This extracted data is loaded directly into target data warehouse.  The transformations and complex business rules are applied by native SQL drivers.  The processing is done by database engine rather than ETL engine.

7 Strengths/ Weaknesses of E-LT Strengths  Since all the data is available, the future changes can be easily incorporated.  Once the data is loaded on the target platform, all transformations are placed on the RDBMS engine. This reduces network congestion.  Provides optimal performance as no extra hardware needed. Weaknesses  E-LT suffers from a limited availability of tools like Informatica Pushdown, Data Integrator etc.  Does not work well for complex business cases.  E-LT cannot be used to design near real-time enterprise data warehouse.

8 Tools and Data Set Software / Hardware  OS Platform: Sun M9000 Server (  Relational Database Management System: Teradata V2R6.2.1 (  ETL/E-LT Tool: Informatica PC Hot Fix 9 Advanced Edition ( Data Set  All the data used is experiments is obtained from the test database a telecom company.  All the experiments were conducted using Informatica as ETL and E-LT tool.

9 Experiment 1 ETL vs. E-LT (Full Pushdown): Informatica has recently introduced both ETL and E-LT capabilities in its Power Center tool Job Name: Tax _Write_Off Job  Same Job was developed using ETL approach as well as E-LT approach.  Informatica pushed down all the code processing to the RDBMS engine.

10 Experiment1 contd. ETL Job: Informatica Server handled all the code processing and generated its own internal SQL

11 Experiment 1 contd. E-LT Job ( Full Pushdown): In the full pushdown, Informatica Server pushed all the code to RDBMS engine and Informatica Server worked purely as an E-LT tool

12 Experiment 1 Results Approximately Five Times more performance gain using E-LT Full Pushdown ApproachData Read (rows) Data Load (rows) Runtime (second) Throughput (rows/sec) MemoryCPU ETL DB/ETL Server E-LT Full DB Memory DB CPU

13 Experiment 2 ETL vs. E-LT (Target Pushdown):  Job Name: Tax_Extract_ETL_tst Job  Same job was developed using ETL approach as well as E-LT approach with Target Pushdown only  The Pushdown to database engine happened at Target database only

14 Experiment 2 contd. ETL Job: Informatica Server handled all the code processing and generated its own internal SQL

15 Experiment 2 contd. E-LT Job (Target Pushdown): Informatica Server pushed code processing on target database to RDBMS engine

16 Experiment 2 Results No Performance Differences since the Pushdown was on Target Database only ApproachData Read (rows) Data Load (rows) Runtime (second) Throughput (rows/sec) MemoryCPU ETL ETL/Partial DB E-LT Target DB/E-LT Server

17 Experiment 3 ETL vs. E-LT (Source Pushdown):  Job Name: Tax_Extract_ETL Job  Same job was developed using ETL approach as well as E-LT approach with Source Pushdown only  The Pushdown to database engine happened at Source database only

18 Experiment 3 contd. ETL Job: Informatica Server handled all the code processing and generated its own internal SQL

19 Experiment 3 contd. E-LT Job (Source Pushdown): Informatica Server pushed down code processing on source side to RDBMS engine

20 Experiments 3 Results  No Performance Differences as Pushdown was on Source Database only  Source Pushdown E-LT does not work with Teradata Sequence Generator ApproachData Read (rows) Data Load (rows) Runtime (second) Throughput (rows/sec) MemoryCPU ETL ETL/Partial DB E-LT Source DB/E-LT Server

21 Results  Significant performance gains were obtained using full pushdown E- LT over ETL approach.  No performance gains using both source and target pushdown of E- LT approach and ETL approach as both used ETL Server and Database Engine resources( memory and CPU).  Code changes were needed to redesign existing ETL jobs to use E-LT power of database engine.  E-LT does not work in building Real Time Data warehouses

22 Conclusion and Future Work  ETL works well for very complex transformations and active data warehouses  E-LT works well for small and medium-sized data marts and when the source and target are on the same database platform only  Future: Building data warehouse solutions using hybrid approach (combination of ETL and E-LT processes).

23 References. [1] A. Simitsis, P. Vassiliadis, T. Sellis “Optimizing ETL Processes in Data Warehouses,” in Proc. 21st International Conference on Data Engineering, 2005, (ICDE 2005), pp [2] G.X. Zhou, Q.S. Xie, Y. Hu, “E-LT Integration to Heterogeneous Data Information for SMEs Networking based on E-HUB,” in Proc. Fourth International Conference on Natural Computation, 2008, IEEE, pp [3] I. William, S. Derek, and N. Genia, DW 2.0: The Architecture for the Next Generation of Data Warehousing. Burlington, MA: Morgan Kaufman, 2008, pp [4] R. J. Davenport, September [Online] ETL vs. ELT: A Subjective View. InSource IT Consulting Ltd., U.K. Available at: [5] L. Troy, C. Pydimukkala, How to Use PowerCenter with Teradata to Load and Unload Data, Informatica Corporation [Online], Available at:

24 Questions ???