1 Do You Need an ETL Tool? Ben Bor NZ Ministry of Health Ben Bor NZ Ministry of Health.

Slides:



Advertisements
Similar presentations
Connecting to Databases. relational databases tables and relations accessed using SQL database -specific functionality –transaction processing commit.
Advertisements

BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
April 28, 2015 Virginia Tech. Data Analytics “Analytics is the combustion engine of business, and it will be necessary for organizations that want to.
Guide to Oracle10G1 Introduction To Forms Builder Chapter 5.
Designing the Data Warehouse and Data Mart Methodologies and Techniques.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
A Guide to Oracle9i1 Advanced SQL And PL/SQL Topics Chapter 9.
Databases and Database Management Systems
Chapter 10 Application Development. Chapter Goals Describe the application development process and the role of methodologies, models and tools Compare.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Bar|Scan ® Asset Inventory System The leader in asset and inventory management.
Phil Brewster  One of the first steps – identify the proper data types  Decide how data (in columns) should be stored and used.
Data Warehouse Tools and Technologies - ETL
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Programming Languages: Telling the Computers What to Do Chapter 16.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
ASP.NET Programming with C# and SQL Server First Edition
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
What is Sure BDCs? BDC stands for Batch Data Communication and is also known as Batch Input. It is a technique for mass input of data into SAP by simulating.
An Improved Approach to Generating Configuration Files from a Database Jon Finke Rensselaer Polytechnic Institute.
Management Information Systems By Effy Oz & Andy Jones
Components of Database Management System
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Databases and Data Warehouses: Supporting the Analytics-Driven.
Tom Castiglia Hershey Technologies
ABC Insurance Co. Paul Barry Steve Randolph Jing Zhou CSC8490 Database Systems & File Management Dr. Goelman Villanova University August 2, 2004.
© 2007 by Prentice Hall 1 Introduction to databases.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
IT 456 Seminar 5 Dr Jeffrey A Robinson. Overview of Course Week 1 – Introduction Week 2 – Installation of SQL and management Tools Week 3 - Creating and.
The Oracle9i Multi-Terabyte Data Warehouse Jeff Parker Manager Data Warehouse Development Amazon.com Session id:
CERN - IT Department CH-1211 Genève 23 Switzerland t DB Development Tools Benthic SQL Developer Application Express WLCG Service Reliability.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
PRACTICE OVERVIEW PL/SQL Part Examine this package specification and body: Which statement about the V_TOTAL_BUDGET variable is true? A. It must.
An Object-Oriented Approach to Programming Logic and Design Fourth Edition Chapter 6 Using Methods.
6 Extraction, Transformation, and Loading (ETL) Transformation.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
SupervisorStudent Prof. Atilla ElciHussam Hussein ABUAZAB June 2007 Using ORACLE XML Parser to Access Ontology CMPE 588 Engineering Semantic for.
What is a Package? A package is an Oracle object, which holds other objects within it. Objects commonly held within a package are procedures, functions,
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Implementing The Middle Tier These slides.
Introduction to OOP CPS235: Introduction.
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
IBM Global Services © 2005 IBM Corporation SAP Legacy System Migration Workbench| March-2005 ALE (Application Link Enabling)
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
Oracle10g Developer: PL/SQL Programming1 Objectives Named program units How to identify parameters The CREATE PROCEDURE statement Creating a procedure.
Chapter – 8 Software Tools.
Text TCS INTERNAL Oracle PL/SQL – Introduction. TCS INTERNAL PL SQL Introduction PLSQL means Procedural Language extension of SQL. PLSQL is a database.
Be “GUI ready” developing in RPG by Robert Arce from PrismaTech. Be “GUI ready” developing in RPG-ILE Presented by: Robert Arce.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
E Copyright © 2006, Oracle. All rights reserved. Using SQL Developer.
Lawson Mid-America User Group Spring 2016 Meeting.
Non Functional Testing. Contents Introduction – Security Testing Why Security Test ? Security Testing Basic Concepts Security requirements - Top 5 Non-Functional.
C Copyright © 2009, Oracle. All rights reserved. Using SQL Developer.
1 Middle East Users Group 2008 Self-Service Engine & Process Rules Engine Presented by: Ryan Flemming Friday 11th at 9am - 9:45 am.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
Tim Hall Oracle ACE Director
Tips for Mastering Relational Databases Using SAS/ACCESS®
Visit for more Learning Resources
Designing and Implementing an ETL Framework
SQL and SQL*Plus Interaction
22-INTEGRATION HUB
PL/SQL Scripting in Oracle:
tRelational/DPS Overview
PROGRAMMING METHODOLOGY
The Database Environment
Software Requirements Specification (SRS) Template.
ITAS Risk Reporting Integration to an ERP
Presentation transcript:

1 Do You Need an ETL Tool? Ben Bor NZ Ministry of Health Ben Bor NZ Ministry of Health

2 Ben Bor  Over 20 years in IT, most of it in Information Management  Oracle specialist since version 5  Involved in Business Intelligence for over 10 years  Consulted the world’s largest corporations  Presents regularly on Information Management  Was annual Guest Lecturer at Sussex University

3 Contents  What is ETL  ETL tools vs. ‘handcraft’ code  PL/SQL techniques

4 What is ETL ETL = Extract, Transform and Load:  Any source, target ;  Built-in complex transformations  Point-to-point vs. hub-and-spoke

5 Traditional ETL

6 Our Own ETL Requirements Flat Files SQL Loader PL/SQL Data Quality

7 Travel Company Example

8 Tools or Handcraft? ETL Advantages:  Graphic User Interface  Automatic documentation  Off-the-shelf set of ready-to- use transformations  Built-in scheduler  Database Agnostic Handcrafting Advantages:  No limitation  reuse existing code & non ETL  No specific methodology  No license cost  No impact on infrastructure  Transportable  Release & Code- Management by script

9 Oracle ETL Facilities  External Tables  Merge  SQL Loader  PL/SQL  Database links

10 Why Use PL/SQL  Integrated environment (no installation required)  Available resources  Reuse code ‘snippets’  Good performance  Integration with and control of the database

11 PL/SQL Tips and Techniques 1.Quality 2.Techniques 3.Tricks

12 Quality

13 What is Quality? [1] “Totality of characteristics of an entity that bears on its ability to satisfy stated and implied needs.“ [The ISO 8204 definition for quality]

14 Quality 2 [2] Quality is a collection of “ilities”:  Reliability - operate error free  Modifiability- have enhancement changes made easily  Understandability - understand the software readily  Efficiency - the speed of the software  Usability - use the software easily  Testability - construct and execute test cases easily  Portability - transport the software easily

15 Quality 3 [3] “All the things you do today in your software development, in order to bear fruit in the future.”

16 Standards & Conventions Use meaningful names V_Number_Of_Items_In_Array vs. i or no_itms Distinguish between types: V_Variable a_Parameter C_Constant G_Global constant

17 Using Packages  Central package with utilities and all output  All error messages and numbers  All common constants (date format etc’)  Global variables  Statistics data  Other packages encapsulate related logic  Within package:  Procedures & functions have:  Meaningful name  A99_ prefix. A is the level (A highest). 99 unique ID

18 Example: procedure and variable naming XXX_Write_Flat_File.U03_Write_Record_To_CSV( a_File_Handle, C_Field_Delim, C_Field_Separ, C_Record_Separ, RM_REFERENCE_rec.REFTYPE, RM_REFERENCE_rec.CODE, RM_REFERENCE_rec.DESCRIPTION, To_Char(RM_REFERENCE_rec.ISDEFAULT, '9') ) ;

19 Techniques Error logging Autonomous Transaction Run statistics Release mechanism Overloading

20 Error Logging Technique Global variables keep key information:  Record ID  Run ID  Location in code Local error trapping decides severity and error code. All error trapping passed up.

21 Error Logging Structure TABLE ERROR_LOG(ERR_TIMEDATE, ERR_NUMINTEGER, SOURCE_URNVARCHAR2(20), SOURCE_SYSTEM_IDVARCHAR2(5), PLACE_IN_CODEVARCHAR2(64), ERR_LOCATIONVARCHAR2(255), ERR_DESCRIPTIONVARCHAR2(512), SEVERITY NUMBER(6) ) ERR_TIME18-OCT-02 10:04:52 ERR_NUM1001 SOURCE_URN SOURCE_SYSTEMCRS PLACE_IN_CODEIn FLIP_PKG B06 ; 6(utils A08) ERR_LOCATIONA08_Lookup_Type ERR_DESCRIPTIONNo match found for [Plan_Code] value [C3] SEVERITY10

22 --=================== PROCEDURE E00_write_error_log( --=================== a_err_numINinteger, a_SeverityINInteger, a_err_locationINVarChar, a_err_descriptionINVarChar) IS PRAGMA AUTONOMOUS_TRANSACTION; V_Place_In_CodeDW_Process.Error_Log.Place_In_Code%Type; BEGIN V_Place_In_Code := G_Place_In_Code || '(utils ' || G_Place_In_UTILS_Code || ')' ; INSERT INTO DW_Process.Error_Log ( err_time,err_num, Severity, BOROUGH_ID,SOURCE_URN,SOURCE_SYSTEM_ID, Place_In_Code,err_location,err_description ) VALUES ( sysdate,a_err_num,a_Severity, G_BOROUGH_ID,G_SOURCE_URN,G_SOURCE_SYSTEM_ID, V_Place_In_Code,a_err_location,a_err_description ) ; COMMIT ; -- commit the autonomous transaction, outside transaction is unaffected. G_Stats_Rec.TOTAL_NO_OF_ERRORS := G_Stats_Rec.TOTAL_NO_OF_ERRORS + 1 ; --=================== ENDE00_Write_Error_Log ; --=================== Autonomous Transaction

23 Run Statistics  G_Stats_Rec is a record with all the statistics fields  Defined in the central package ( therefore resident in memory )  It is updated by the writing procedures (all central)  It is written out at the end of the run

24 Release Mechanism  Table of ‘release notes’  Each package has C_Version constant updated each release  ‘Show_Version’ scripts display versions and notes  Results shipped with each release

25 Remove Spaces --=================== FUNCTIONA04_Remove_Spaces( --=================== a_InstringIN Varchar ) Return Varchar IS /* ** Removes all the spaces from a string, leaving the rest of the printable characters */ BEGIN G_place_in_UTILS_code := 'A04' ; -- For use by the error trapping routine RETURN TRANSLATE( a_Instring, 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ’ || '\|, ', 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ’ || '\|, ) ; --=================== ENDA04_Remove_Spaces ; --===================

26 Strip Leading non-numerics --============================ FUNCTIONF09_Strip_Leading_non_digits( --============================ a_StringIN VARCHAR2 ) RETURN VARCHAR2 IS /* ** Remove leading non-digits from the input. ** Example: Input string: 'abcde12345edcba' ** Output string: '12345edcba' */ v_string Varchar2(4000) ; v_first_digit_posInteger ; BEGIN -- Replace all digits by 0 v_string := Translate(a_String, ' ', ' ') ; v_first_digit_pos := instr(v_string,'0') ; RETURN F01_Right(a_String, v_first_digit_pos ) ; --============================ ENDF09_Strip_Leading_non_digits; --============================

27 Overloading --======================= PROCEDUREU03_Write_Record_To_CSV( --======================= a_File_HandleINutl_file.file_type, a_Field_DelimINVarChar, -- the quotes, for CSV a_Field_SeparINVarChar, -- the comma, for CSV a_Record_SeparINVarChar, -- the Carriage Return + Line feed, for CSV a_String1INVarChar := G_default_Value, a_String2INVarChar := G_default_Value, a_String3INVarChar := G_default_Value,. ) IS BEGIN IF a_String1 = G_default_Value THENGOTO End_Of_Record ; END IF ; U02_Write(a_File_Handle, a_Field_Delim || a_String1 || a_Field_Delim) ; IF a_String2 = G_default_Value THENGOTO End_Of_Record ; END IF ; U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String2 || a_Field_Delim ) ; IF a_String3 = G_default_Value THENGOTO End_Of_Record ; END IF ; U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String3 || a_Field_Delim ) ;. > U01_Write_Line(a_File_Handle, a_Record_Separ) ; --======================= ENDU03_Write_Record_To_CSV ; =======================

28 Summary ETL or PL/SQL? Your choice.  Consider:  Overall cost  ‘Politics’  Convenience  Portability  Speed of development  Reusability IF PL/SQL : ensure Quality

29 Thank you !

30

31 Thank you ! I can be contacted at