Presentation is loading. Please wait.

Presentation is loading. Please wait.

ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.

Similar presentations


Presentation on theme: "ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also."— Presentation transcript:

1 ETL Extract Transform Load

2 Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to convert database from one format or type to another

3 Process of ETL Extract Process of reading data from a database Transform Process of converting the extracted data from its previous form into the form it need to be By using rules or lookup tables or by combing the data with other data Load Process of writing the data into the target database

4 Operations of Transform Selecting only certain columns to load Translating coded values Encoding free-form values Sorting Joining data from multiple sources Aggregation Splitting a column into multiple columns Deriving a new calculated values …

5 Pentaho Data Integration Pentaho data integration (PDI, also called Kettle) is for ETL processes. Download: http://sourceforge.net/projects/pentaho/files/Data Integration/ Two parts of PDI –Transformation: transformation is the process of ETL –Job: job is used to run transformation

6 Transformation Definition TransformationDescription ValueValues are part of a row and can contain any type of data: strings, floating point numbers, integers, dates or boolean values RowA row consists of 0 or more values that are processed together as a single entry Input StreamA stack of rows that enters a step HopA graphical representation of one or more data streams between two steps; a hop always represents the output stream for one step and the input stream for another NoteDescriptive text that can be added to a transformation Step Hop Note

7 Main Components All components Input Output Transformation

8 Job Definition TransformationDescription Job EntryA part of job that performs a special task HopA graphical representation of one or more data streams between two steps; a hop always represents the output stream for one step and the input stream for another NoteDescriptive text that can be added to a job Job Entry Hop Note

9 Components of PDI Spoon – GUI tool to design the ETL process transformations. – Creating jobs which automate the database update process – Performing the typical data flow functions including: reading, validating, refining, transforming, writing data Pan – Application to run data transformations designed in Spoon Kitchen – Application helps execute the jobs in a batch mode, usually using a schedule Carte – A web Server which allows remote monitoring of the running PDI ETL processes through a web browser

10 Feature of PDI Simple Visual Designer Graphic ETL tool Dynamic transformations Integrated debugger for testing and tuning job execution

11 Feature of PDI Drag and Drop Integration Rich library of pre-build components to access Integration with Zero-Coding Required Powerful Administration and Management Data Profiling and Data Quality Identify data that fails to comply with business rules and standards Manager data quality with partners such as human interface

12 Feature of PDI Support for Any Big Data Source


Download ppt "ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also."

Similar presentations


Ads by Google