Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.

Slides:



Advertisements
Similar presentations
Advanced SQL (part 1) CS263 Lecture 7.
Advertisements

CC SQL Utilities.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
CSC271 Database Systems Lecture # 13. Summary: Previous Lecture  Grouping through GROUP BY clause  Restricted groupings  Subqueries  Multi-Table queries.
Technical BI Project Lifecycle
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Exploring Microsoft Excel 2002 Chapter 7 Chapter 7 List and Data Management: Converting Data to Information By Robert T. Grauer Maryann Barber Exploring.
Introduction to Structured Query Language (SQL)
A Guide to SQL, Seventh Edition. Objectives Create a new table from an existing table Change data using the UPDATE command Add new data using the INSERT.
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making.
Introduction to Structured Query Language (SQL)
Chapter 7 Data Management. Agenda Database concept Import data Input and edit data Sort data Function Filter data Create range name Calculate subtotal.
1 Table Alteration. 2 Altering Tables Table definition can be altered after its creation Adding columns Changing columns’ definition Dropping columns.
Chapter 3: Using SQL Queries to Insert, Update, Delete, and View Data
Macros Tutorial Week 20. Objectives By the end of this tutorial you should understand how to: Create macros Assign macros to events Associate macros with.
Database Systems More SQL Database Design -- More SQL1.
Introduction to Structured Query Language (SQL)
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
Concepts of Database Management Sixth Edition
Inner join, self join and Outer join Sen Zhang. Joining data together is one of the most significant strengths of a relational database. A join is a query.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.
SQL's Data Definition Language (DDL) – View, Sequence, Index.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
ASP.NET Programming with C# and SQL Server First Edition
PHP Programming with MySQL Slide 8-1 CHAPTER 8 Working with Databases and MySQL.
1 Chapter 4: Creating Simple Queries 4.1 Introduction to Querying Data 4.2 Filtering and Sorting Data 4.3 Creating New Columns with an Expression 4.4 Grouping.
McGraw-Hill Technology Education © 2004 by the McGraw-Hill Companies, Inc. All rights reserved. Office Access 2003 Lab 3 Analyzing Data and Creating Reports.
Analyzing Data For Effective Decision Making Chapter 3.
SQL/Lesson 4/Slide 1 of 45 Using Subqueries and Managing Databases Objectives In this lesson, you will learn to: *Use subqueries * Use subqueries with.
Chapter 6 SQL: Data Manipulation (Advanced Commands) Pearson Education © 2009.
Concepts of Database Management Seventh Edition
DATABASE TRANSACTION. Transaction It is a logical unit of work that must succeed or fail in its entirety. A transaction is an atomic operation which may.
Oracle Data Integrator Transformations: Adding More Complexity
 Agenda 2/20/13 o Review quiz, answer questions o Review database design exercises from 2/13 o Create relationships through “Lookup tables” o Discuss.
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making Chapter.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
O FFICE M ANAGEMENT T OOL - II B BA -V I TH. Abdus Salam2 Week-7 Introduction to Query Introduction to Query Querying from Multiple Tables Querying from.
Advanced SQL: Triggers & Assertions
Database Lab Lecture 1. Database Languages Data definition language ( DDL ) Data definition language –defines data types and the relationships among them.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
D Copyright © Oracle Corporation, All rights reserved. Loading Data into a Database.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
1 Chapter 4: Creating Simple Queries 4.1 Introduction to the Query Task 4.2 Selecting Columns and Filtering Rows 4.3 Creating New Columns with an Expression.
- Joiner Transformation. Introduction ►Transformations help to transform the source data according to the requirements of target system and it ensures.
Session 1 Module 1: Introduction to Data Integrity
A Guide to SQL, Eighth Edition Chapter Six Updating Data.
Relational Database Management System(RDBMS) Structured Query Language(SQL)
In this session, you will learn to: Query data by using joins Query data by using subqueries Objectives.
A Guide to MySQL 6. 2 Objectives Create a new table from an existing table Change data using the UPDATE command Add new data using the INSERT command.
Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.
LM 5 Introduction to SQL MISM 4135 Instructor: Dr. Lei Li.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
 CONACT UC:  Magnific training   
LEC-8 SQL. Indexes The CREATE INDEX statement is used to create indexes in tables. Indexes allow the database application to find data fast; without reading.
XP Chapter 1 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Level 2 Objectives: Understanding and Creating Table.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Relational Database Design
Prepared by : Moshira M. Ali CS490 Coordinator Arab Open University
Instructor: Craig Duckett Lecture 09: Tuesday, April 25th, 2017
PL/SQL LANGUAGE MULITPLE CHOICE QUESTION SET-1
Plug-In T7: Problem Solving Using Access 2007
Informatica PowerCenter Performance Tuning Tips
Aggregation Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together,
Chapter 8 Working with Databases and MySQL
Chapter 14 Sorting and Merging.
Contents Preface I Introduction Lesson Objectives I-2
Chapter 8 Advanced SQL.
Presentation transcript:

Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions for each group. The summed totals for each group are output from the stage thro' output link. Group is a set of record with the same value for one or more columns. Example : Transaction records might be grouped by both day of the week and by month. These groupings might show the busiest day of the week varies by season.

Input & View data : The INPUT page shows you the metadata of the incoming data. The input data look like this…

Properties : Here, we group by "Gender". The column to aggregate. User defined column to collect the aggregated values. When "Aggregation Type = Calculation" …

Output & View data : The OUTPUT page shows only those columns used to group and aggregate. As we have grouped by "Gender", the incomes of Males and Females are summed up and shown here.

Execution Mode : Note : The Aggregator stage can have only one output link.

Properties : Here, we group by "Gender". The column to be counted. When "Aggregation Type = Count Rows" …

Output & View data : The OUTPUT page shows only the grouping column and the column to be counted. As we have grouped by "Gender", the number of records in Males and Females are totaled and shown here.

Execution Mode : Note : The Aggregator stage can have only one output link.

Properties : Here, we group by "Gender". The column to preserver the summary of Recalculation. When "Aggregation Type = Re-calculation" … Note : When the "Aggregation Type = Re-calculation" then place an extra aggregator to aggregate a column, first. The second aggregator will re-calculate the previously calculated column.

Output & View data : The OUTPUT page shows only the grouping column and the column for recalculation. The column "MaxVal” is recalculated as "SumOfMaxVal"

Execution Mode : Note : The Aggregator stage can have only one output link.

Change Apply Stage : Definition : Takes the change data set, that contains the changes in the before and after data sets, from the Change Capture stage and applies the encoded change operations to a before data set to compute an after data set. The Change Apply stage read a record from the change data set and from the before data set, compares their key column values, and acts accordingly.

Change Capture Property : Change Apply Property :

Input (Before Changes) : Input (After Changes) : Output :

Job :

Compress Stage : The Compress stage uses the UNIX compress or GZIP utility to compress a data set. It converts a data set from a sequence of records into a stream of raw binary data.

Steps : Set the Stage Properties : "Command = Uncompress" Load the Metadata in the Output tab…

Example : Limitations : A compressed data set cannot be processed by many stages until it is expanded, i.e., until its rows are returned to their normal format. Stages that do not perform column based processing or reorder the rows can operate on compressed data sets. For example, you can use the copy stage to create a copy of the compressed data set.

Expand Stage : The Expand stage uses the UNIX compress or GZIP utility to expand the data set. It converts a data set from a stream of raw binary data into sequence of records.

Steps : Set the Stage Properties : "Command = Uncompress" Load the Metadata in the Output tab…

Example Job :

Filter Stage : Definition : The Filter stage transfers, unmodified, the records of the input data set which satisfy the specified requirements and filters out all other records. Filter stage can have a single input link and a any number of output links and, optionally, a single reject link. You can specify different requirements to route rows down different output links. The filtered out records can be routed to a reject link, if required.

Simple Job :

Criteria to Filter : Note : Only if the "Output Rejects=True" the rejected rows are collected separately, otherwise those rows that fails to satisfy the criteria will be ignored.

Input : Output :Reject Rows : Criteria : Salary>30000

Complex Job :

Criteria to Filter : Note : "Output Row Only Once=False" means, every single input row is forced to satisfy each and every criteria given. In other words, Row that satisfies a criteria is forced to undergo another criteria. In such case, every row gets more than a single chance to output.

Input Data Criteria 1 : Salary>30000 Criteria 2 : Number>1 Criteria 3 : Number=3Reject Rows :

Criteria to Filter : Note : "Output Row Only Once=True" means, every single input row is not forced to undergo each and every criteria given. In other words, Rows that satisfy at least one criteria is not forced to satisfy another criteria. In such case, every row gets a single chance to output.

Input Data Criteria 1 : Salary>30000 Criteria 2 : Number>1 Criteria 3 : Number=3Reject Rows : Note : Though there is a row that satisfies this criteria, it is not outputted as it is already been outputted for satisfying the Criteria – 1.

Funnel Stage : Definition : Funnel Stage copies multiple input data sets to a single output data set. This operation is useful for combining separate data sets into a single large data set. The stage can have any number of input links and a single output link.

Type – 1 : Continuous Funnel Note : This type of Funnel combines the records of the input data in no guaranteed order. It takes one record from each input link in turn. If data is not available on an input link, the stage skips to the next link rather than waiting.

Output view data : Input - 1 view data :Input - 2 view data : Continuous Funnel type …

Type – 2 : Sequence Note : This type of Funnel copies all records from the first input data set to the output data set, then all the records from the second input data set, and so on.

Output view data : Input - 1 view data :Input - 2 view data : Sequence type …

Type – 3 : Sort Funnel Note : This type of Funnel combines the input records in the order defined by the value(s) of one or more key columns and the order of the output records is determined by these sorting keys.

Output view data : Input - 1 view data :Input - 2 view data : Sort Funnel type …

Job :

Join Stage : Definition : Join Stage performs join operations on two or more data sets input to the stage and then outputs the resulting data set. The input data sets are notionally identified as the "right" set and the "left" set, and "intermediate" sets. It has any number of input links and a single output link.

Join Type = Full Outer Job :

Left Input :Right Input : Output :

Join Type = Inner Job :

Left Input :Right Input : Output :

Join Type = Left Outer Job :

Left Input :Right Input : Output :

Join Type = Right Outer Job :

Left Input :Right Input : Output :

Lookup Stage : Definition : Lookup Stage used to perform lookup operations on a data set read into memory from any other Parallel job stage that can output data. It can also perform lookups directly in a DB2 or Oracle database or in a lookup table contained in a Lookup File Set stage.

Mappings :

Input : References : Output : Reject :

Job :

Merge Stage : Definition : Join Stage combines a sorted master data set with one or more update data sets. The columns from the records in the master and update data sets are merged so that the output record contains all the columns from the master record plus any additional columns from each update record. A master record and an update record are merged only if both of them have the same values for the merge key column(s) that you specify. Merge key columns are one or more columns that exist in both the master and update records. The data sets input to the Merge stage must be key partitioned and sorted. This ensures that rows with the same key column values are located in the same partition and will be processed by the same node.

Unmatched Masters Mode = Drop Job :

Master : Updates : Output : Reject :

Job : Unmatched Masters Mode = Keep

Master : Updates : Output : Reject :

Note : If the options "Warn on Reject Updates = True" and "Warn on Unmatched Masters = True" then the log file shows the warnings on Reject Updates and Unmatched Data from Masters.

Note : If the options "Warn on Reject Updates = False" and "Warn on Unmatched Masters = False" then the log file do not shows the warnings on Reject Updates and Unmatched Data from Masters.

Modify Stage : Definition : The Modify stage alters the record schema of its input data set. The modified data set is then output. It is a processing stage. It can have a single input and single output.

Job (before handling): Null Handling:

Null Handling… Step – 1: "NULL" value has to be handled…

Step – 2: Null Handling Syntax : Column_Name=Handle_Null('Column_Name',Value) Input Link ColumnsOutput Link Columns Null Handling…

Step – 3: "NULL" value has been handled… Null Handling…

Job (after execution): Null Handling…

Job (before execution): Drop Column(s):

Step – 1: The column "MGR" has to be dropped… Drop Columns …

Step – 2: Dropping Column Syntax : DROP Column_Name Input Link ColumnsOutput Link Columns Drop Columns …

Step – 3: The column "MGR" has been dropped… Drop Columns …

Job (after execution): Drop Columns …

Job (before execution): Keep Column(s):

Step – 1: The column "EmpNo" has to be kept… Keep Columns …

Step – 2: Keeping Column Syntax : KEEP Column_Name Input Link ColumnsOutput Link Columns Keep Columns …

Step – 3: The column "EmpNo" is kept… Keep Columns …

Job (after execution): Keep Columns …

Job (before execution): Type Conversion:

Step – 1: The column "HireDate" has to converted to Date… Type Conversion …

Step – 2: Type Conversio n Syntax : Column_Name=type_conversion('Column_Name') Input Link ColumnsOutput Link Columns Type Conversion …

Step – 3: The column "HireDate" has been converted to Date… Type Conversion …

Job (after execution): Type Conversion …

Job (before execution): Multiple Specifications:

Step – 1: The column "HireDate" has to converted to Date… The column "MGR" has to be Null handled… Multiple Specification …

Step – 2: Null Handling Type Conversion Input Link ColumnsOutput Link Columns Multiple Specification …

Step – 3: The column "HireDate" has been converted to Date… The column "MGR" has been Null handled… Multiple Specification …

Job (after execution): Multiple Specification …

Pivot Stage converts columns in to rows. Eg., Mark-1 and Mark-2 are two columns. Task : Convert all the columns in to one column. Implication : Can be used to co SCD Type-3 to Type-2. Using Methodology : In the deviation field of the output column change the input columns in to one column. Eg., Column Name – "Marks". Derivation : Mark-1 and Mark-2. Note : Column "Marks" is derived from the input columns Mark-1 and Mark-2. Pivot Stage :

Job (before execution):

Source File Input metadata Output Metadata: Note the change in the derivation …

Job (after execution) : Output file:

Remove Duplicates Stage: Definition : The Remove Duplicates stage takes a single sorted data set as input, removes all duplicate records, and writes the results to an output data set. Removing duplicate records is a common way of cleansing a data set before you perform further processing. Two records are considered duplicates if they are adjacent in the input data set and have identical values for the key column(s).

Last duplicate row dropped… Output view data :Input view data : Selecting Key & Retrain Row :Sorting Column :

First duplicate row dropped… Output view data : Selecting Key & Retrain Row : Input view data :

Job (after execution) :

Surrogate Key Generator Stage : Definition : The Surrogate Key stage generates key columns for an existing data set. User can specify certain characteristics of the key sequence. The stage generates sequentially incrementing unique integers from a given starting point. The existing columns of the data set are passed straight through the stage. If the stage is operating in parallel, each node will increment the key by the number of partitions being written to.

Job (before execution):

Input : Output : Property :

Job (after execution):

Switch Stage : Definition : The switch stage takes a single data set as input and assigns each input row to an output data set based on the value of a selector field. It can have a single input link, up to 128 output links and a single rejects link. This stage performs an operation similar to a C switch statement. Rows that satisfy none of the cases are output on the rejects link.

Property :

Input : Output - 1 : Reject : Output - 2 : Output - 3 :

Job (after execution):

Property :

Input : Output - 1 : Output - 2 : Output - 3 :

Job (after execution):

Property :

Input : Output - 1 : Reject : Output - 2 : Output - 3 :

Job (after execution):