Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Do You Need an ETL Tool? Ben Bor NZ Ministry of Health Ben Bor NZ Ministry of Health.

Similar presentations


Presentation on theme: "1 Do You Need an ETL Tool? Ben Bor NZ Ministry of Health Ben Bor NZ Ministry of Health."— Presentation transcript:

1 1 Do You Need an ETL Tool? Ben Bor NZ Ministry of Health Ben Bor NZ Ministry of Health

2 2 Ben Bor  Over 20 years in IT, most of it in Information Management  Oracle specialist since version 5  Involved in Business Intelligence for over 10 years  Consulted the world’s largest corporations  Presents regularly on Information Management  Was annual Guest Lecturer at Sussex University

3 3 Contents  What is ETL  ETL tools vs. ‘handcraft’ code  PL/SQL techniques

4 4 What is ETL ETL = Extract, Transform and Load:  Any source, target ;  Built-in complex transformations  Point-to-point vs. hub-and-spoke

5 5 Traditional ETL

6 6 Our Own ETL Requirements Flat Files SQL Loader PL/SQL Data Quality

7 7 Travel Company Example

8 8 Tools or Handcraft? ETL Advantages:  Graphic User Interface  Automatic documentation  Off-the-shelf set of ready-to- use transformations  Built-in scheduler  Database Agnostic Handcrafting Advantages:  No limitation  reuse existing code & non ETL  No specific methodology  No license cost  No impact on infrastructure  Transportable  Release & Code- Management by script

9 9 Oracle ETL Facilities  External Tables  Merge  SQL Loader  PL/SQL  Database links

10 10 Why Use PL/SQL  Integrated environment (no installation required)  Available resources  Reuse code ‘snippets’  Good performance  Integration with and control of the database

11 11 PL/SQL Tips and Techniques 1.Quality 2.Techniques 3.Tricks

12 12 Quality

13 13 What is Quality? [1] “Totality of characteristics of an entity that bears on its ability to satisfy stated and implied needs.“ [The ISO 8204 definition for quality]

14 14 Quality 2 [2] Quality is a collection of “ilities”:  Reliability - operate error free  Modifiability- have enhancement changes made easily  Understandability - understand the software readily  Efficiency - the speed of the software  Usability - use the software easily  Testability - construct and execute test cases easily  Portability - transport the software easily

15 15 Quality 3 [3] “All the things you do today in your software development, in order to bear fruit in the future.”

16 16 Standards & Conventions Use meaningful names V_Number_Of_Items_In_Array vs. i or no_itms Distinguish between types: V_Variable a_Parameter C_Constant G_Global constant

17 17 Using Packages  Central package with utilities and all output  All error messages and numbers  All common constants (date format etc’)  Global variables  Statistics data  Other packages encapsulate related logic  Within package:  Procedures & functions have:  Meaningful name  A99_ prefix. A is the level (A highest). 99 unique ID

18 18 Example: procedure and variable naming XXX_Write_Flat_File.U03_Write_Record_To_CSV( a_File_Handle, C_Field_Delim, C_Field_Separ, C_Record_Separ, RM_REFERENCE_rec.REFTYPE, RM_REFERENCE_rec.CODE, RM_REFERENCE_rec.DESCRIPTION, To_Char(RM_REFERENCE_rec.ISDEFAULT, '9') ) ;

19 19 Techniques Error logging Autonomous Transaction Run statistics Release mechanism Overloading

20 20 Error Logging Technique Global variables keep key information:  Record ID  Run ID  Location in code Local error trapping decides severity and error code. All error trapping passed up.

21 21 Error Logging Structure TABLE ERROR_LOG(ERR_TIMEDATE, ERR_NUMINTEGER, SOURCE_URNVARCHAR2(20), SOURCE_SYSTEM_IDVARCHAR2(5), PLACE_IN_CODEVARCHAR2(64), ERR_LOCATIONVARCHAR2(255), ERR_DESCRIPTIONVARCHAR2(512), SEVERITY NUMBER(6) ) ERR_TIME18-OCT-02 10:04:52 ERR_NUM1001 SOURCE_URN223010913 SOURCE_SYSTEMCRS PLACE_IN_CODEIn FLIP_PKG B06 ; 6(utils A08) ERR_LOCATIONA08_Lookup_Type ERR_DESCRIPTIONNo match found for [Plan_Code] value [C3] SEVERITY10

22 22 --=================== PROCEDURE E00_write_error_log( --=================== a_err_numINinteger, a_SeverityINInteger, a_err_locationINVarChar, a_err_descriptionINVarChar) IS PRAGMA AUTONOMOUS_TRANSACTION; V_Place_In_CodeDW_Process.Error_Log.Place_In_Code%Type; BEGIN V_Place_In_Code := G_Place_In_Code || '(utils ' || G_Place_In_UTILS_Code || ')' ; INSERT INTO DW_Process.Error_Log ( err_time,err_num, Severity, BOROUGH_ID,SOURCE_URN,SOURCE_SYSTEM_ID, Place_In_Code,err_location,err_description ) VALUES ( sysdate,a_err_num,a_Severity, G_BOROUGH_ID,G_SOURCE_URN,G_SOURCE_SYSTEM_ID, V_Place_In_Code,a_err_location,a_err_description ) ; COMMIT ; -- commit the autonomous transaction, outside transaction is unaffected. G_Stats_Rec.TOTAL_NO_OF_ERRORS := G_Stats_Rec.TOTAL_NO_OF_ERRORS + 1 ; --=================== ENDE00_Write_Error_Log ; --=================== Autonomous Transaction

23 23 Run Statistics  G_Stats_Rec is a record with all the statistics fields  Defined in the central package ( therefore resident in memory )  It is updated by the writing procedures (all central)  It is written out at the end of the run

24 24 Release Mechanism  Table of ‘release notes’  Each package has C_Version constant updated each release  ‘Show_Version’ scripts display versions and notes  Results shipped with each release

25 25 Remove Spaces --=================== FUNCTIONA04_Remove_Spaces( --=================== a_InstringIN Varchar ) Return Varchar IS /* ** Removes all the spaces from a string, leaving the rest of the printable characters */ BEGIN G_place_in_UTILS_code := 'A04' ; -- For use by the error trapping routine RETURN TRANSLATE( a_Instring, 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890’ || '\|, /?#~@;:[{]}=+-_`¬!"£$%^&*() ', 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890’ || '\|, /?#~@;:[{]}=+-_`¬!"£$%^&*()' ) ; --=================== ENDA04_Remove_Spaces ; --===================

26 26 Strip Leading non-numerics --============================ FUNCTIONF09_Strip_Leading_non_digits( --============================ a_StringIN VARCHAR2 ) RETURN VARCHAR2 IS /* ** Remove leading non-digits from the input. ** Example: Input string: 'abcde12345edcba' ** Output string: '12345edcba' */ v_string Varchar2(4000) ; v_first_digit_posInteger ; BEGIN -- Replace all digits by 0 v_string := Translate(a_String, '1234567890', '0000000000') ; v_first_digit_pos := instr(v_string,'0') ; RETURN F01_Right(a_String, v_first_digit_pos ) ; --============================ ENDF09_Strip_Leading_non_digits; --============================

27 27 Overloading --======================= PROCEDUREU03_Write_Record_To_CSV( --======================= a_File_HandleINutl_file.file_type, a_Field_DelimINVarChar, -- the quotes, for CSV a_Field_SeparINVarChar, -- the comma, for CSV a_Record_SeparINVarChar, -- the Carriage Return + Line feed, for CSV a_String1INVarChar := G_default_Value, a_String2INVarChar := G_default_Value, a_String3INVarChar := G_default_Value,. ) IS BEGIN IF a_String1 = G_default_Value THENGOTO End_Of_Record ; END IF ; U02_Write(a_File_Handle, a_Field_Delim || a_String1 || a_Field_Delim) ; IF a_String2 = G_default_Value THENGOTO End_Of_Record ; END IF ; U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String2 || a_Field_Delim ) ; IF a_String3 = G_default_Value THENGOTO End_Of_Record ; END IF ; U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String3 || a_Field_Delim ) ;. > U01_Write_Line(a_File_Handle, a_Record_Separ) ; --======================= ENDU03_Write_Record_To_CSV ; --------------------------------------------------------------------------------------------------------------------------------------------------------------- --=======================

28 28 Summary ETL or PL/SQL? Your choice.  Consider:  Overall cost  ‘Politics’  Convenience  Portability  Speed of development  Reusability IF PL/SQL : ensure Quality

29 29 Thank you !

30 30

31 31 Thank you ! I can be contacted at ben_bor@moh.govt.nz


Download ppt "1 Do You Need an ETL Tool? Ben Bor NZ Ministry of Health Ben Bor NZ Ministry of Health."

Similar presentations


Ads by Google