Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693.

Similar presentations


Presentation on theme: "1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693."— Presentation transcript:

1 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

2 2 Agenda  Introduction  Extract, Transform and Load (ETL) Approach  Strengths/ Weaknesses of ETL  Extract, Load and Transform (E-LT) Approach  Strengths/ Weaknesses of E-LT  Experiments  Results  Conclusion/ Future Work  References  Q & A

3 3 Introduction  All Business Intelligence applications are data-centric.  Large volumes of data is stored and processed along with its history in the Data Warehouse.  The Data is extracted from various heterogeneous source systems and transformed as per the business requirements.  Data is used for analytical purposes- future forecasting, profitability analysis, trend analysis etc. to drive the business.

4 4 Extract, Transform and Load (ETL)  Traditional approach of loading data into data warehouses.  Data is first pulled/ pushed from various heterogeneous sources like ERP, CRM, RDBMS, Flat Files.  Business rules are applied on the data in the Staging Area.  Transformed data is loaded into target database.  Often designed backward, thus only the relevant data is fetched.

5 5 Strengths/ Weaknesses of ETL Strengths  Can perform complex operations in single data flow diagrams.  Mostly designed backward, thus only relevant data is loaded.  Used for building real-time data warehouses.  Robust tools are available like Informatica, Ab Initio, Data Stage. Weaknesses  Data transformation step of ELT is performed by ETL engine, therefore increasing processing time.  Data is moved over the network twice.  Since it is developed backward, more effort for future redesign.

6 6 Extract, Load and Transform (E-LT)  Newer Approach of loading into target data warehouse.  The Data is extracted from various sources same way it is done via ETL.  This extracted data is loaded directly into target data warehouse.  The transformations and complex business rules are applied by native SQL drivers.  The processing is done by database engine rather than ETL engine.

7 7 Strengths/ Weaknesses of E-LT Strengths  Since all the data is available, the future changes can be easily incorporated.  Once the data is loaded on the target platform, all transformations are placed on the RDBMS engine. This reduces network congestion.  Provides optimal performance as no extra hardware needed. Weaknesses  E-LT suffers from a limited availability of tools like Informatica Pushdown, Data Integrator etc.  Does not work well for complex business cases.  E-LT cannot be used to design near real-time enterprise data warehouse.

8 8 Tools and Data Set Software / Hardware  OS Platform: Sun M9000 Server (www.sun.com).www.sun.com  Relational Database Management System: Teradata V2R6.2.1 (www.teradata.com).www.teradata.com  ETL/E-LT Tool: Informatica PC 8.6.1 Hot Fix 9 Advanced Edition (www.informatica.com).www.informatica.com Data Set  All the data used is experiments is obtained from the test database a telecom company.  All the experiments were conducted using Informatica as ETL and E-LT tool.

9 9 Experiment 1 ETL vs. E-LT (Full Pushdown): Informatica has recently introduced both ETL and E-LT capabilities in its Power Center tool Job Name: Tax _Write_Off Job  Same Job was developed using ETL approach as well as E-LT approach.  Informatica pushed down all the code processing to the RDBMS engine.

10 10 Experiment1 contd. ETL Job: Informatica Server handled all the code processing and generated its own internal SQL

11 11 Experiment 1 contd. E-LT Job ( Full Pushdown): In the full pushdown, Informatica Server pushed all the code to RDBMS engine and Informatica Server worked purely as an E-LT tool

12 12 Experiment 1 Results Approximately Five Times more performance gain using E-LT Full Pushdown ApproachData Read (rows) Data Load (rows) Runtime (second) Throughput (rows/sec) MemoryCPU ETL32044 110291DB/ETL Server E-LT Full32044 65340DB Memory DB CPU

13 13 Experiment 2 ETL vs. E-LT (Target Pushdown):  Job Name: Tax_Extract_ETL_tst Job  Same job was developed using ETL approach as well as E-LT approach with Target Pushdown only  The Pushdown to database engine happened at Target database only

14 14 Experiment 2 contd. ETL Job: Informatica Server handled all the code processing and generated its own internal SQL

15 15 Experiment 2 contd. E-LT Job (Target Pushdown): Informatica Server pushed code processing on target database to RDBMS engine

16 16 Experiment 2 Results No Performance Differences since the Pushdown was on Target Database only ApproachData Read (rows) Data Load (rows) Runtime (second) Throughput (rows/sec) MemoryCPU ETL1542714628131879246ETL/Partial DB E-LT Target1542714628131887245.26DB/E-LT Server

17 17 Experiment 3 ETL vs. E-LT (Source Pushdown):  Job Name: Tax_Extract_ETL Job  Same job was developed using ETL approach as well as E-LT approach with Source Pushdown only  The Pushdown to database engine happened at Source database only

18 18 Experiment 3 contd. ETL Job: Informatica Server handled all the code processing and generated its own internal SQL

19 19 Experiment 3 contd. E-LT Job (Source Pushdown): Informatica Server pushed down code processing on source side to RDBMS engine

20 20 Experiments 3 Results  No Performance Differences as Pushdown was on Source Database only  Source Pushdown E-LT does not work with Teradata Sequence Generator ApproachData Read (rows) Data Load (rows) Runtime (second) Throughput (rows/sec) MemoryCPU ETL154271 948162ETL/Partial DB E-LT Source 154271 947162DB/E-LT Server

21 21 Results  Significant performance gains were obtained using full pushdown E- LT over ETL approach.  No performance gains using both source and target pushdown of E- LT approach and ETL approach as both used ETL Server and Database Engine resources( memory and CPU).  Code changes were needed to redesign existing ETL jobs to use E-LT power of database engine.  E-LT does not work in building Real Time Data warehouses

22 22 Conclusion and Future Work  ETL works well for very complex transformations and active data warehouses  E-LT works well for small and medium-sized data marts and when the source and target are on the same database platform only  Future: Building data warehouse solutions using hybrid approach (combination of ETL and E-LT processes).

23 23 References. [1] A. Simitsis, P. Vassiliadis, T. Sellis “Optimizing ETL Processes in Data Warehouses,” in Proc. 21st International Conference on Data Engineering, 2005, (ICDE 2005), pp. 564-575. [2] G.X. Zhou, Q.S. Xie, Y. Hu, “E-LT Integration to Heterogeneous Data Information for SMEs Networking based on E-HUB,” in Proc. Fourth International Conference on Natural Computation, 2008, IEEE, pp. 212-216. [3] I. William, S. Derek, and N. Genia, DW 2.0: The Architecture for the Next Generation of Data Warehousing. Burlington, MA: Morgan Kaufman, 2008, pp. 215- 229. [4] R. J. Davenport, September 2007. [Online] ETL vs. ELT: A Subjective View. InSource IT Consulting Ltd., U.K. Available at: http://www.insource.co.uk/pdf/ETL_ELT.pdf http://www.insource.co.uk/pdf/ETL_ELT.pdf [5] L. Troy, C. Pydimukkala, How to Use PowerCenter with Teradata to Load and Unload Data, Informatica Corporation [Online], Available at: www.myinformatica.com www.myinformatica.com

24 24 Questions ???


Download ppt "1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693."

Similar presentations


Ads by Google