Presentation is loading. Please wait.

Presentation is loading. Please wait.

Types of Joins Farrokh Alemi, Ph.D.

Similar presentations


Presentation on theme: "Types of Joins Farrokh Alemi, Ph.D."— Presentation transcript:

1 Types of Joins Farrokh Alemi, Ph.D.
In this section we discuss how different types of joins work in SQL. This brief presentation was organized by Dr. Alemi.

2 Link Data across 2 or More Tables
The purpose of join commands is to link data across two or more tables. If the data are in more than one table then the tables must be joined before the data are available to the analyst.

3 Cross Full Left/Right Inner
There are four different ways that two tables can be joined. The smallest join is the Inner join. Left or right join increase the size of the resulting table. Full join also increases the size further and Cross join creates the largest resulting table. Joins have a large impact on what records are included in the final table. Every join is a complicated WHERE statement that filters the data in a particular manner.

4 SELECT column_name(s) FROM table1 INNER JOIN table2
ON table1.column_name = table2.column_name; This slide provides the syntax for the inner join. It is the most common join in SQL code.

5 SELECT column_name(s) FROM table1 INNER JOIN table2
Unique or Addressed SELECT column_name(s) FROM table1 INNER JOIN table2  ON table1.column_name = table2.column_name; The SELECT portion of the code specifies that column names across the two tables. Column names should be unique across the two tables or must be prefaced with the table name.

6 SELECT column_name(s) FROM table1 INNER JOIN table2
ON table1.column_name = table2.column_name; The FROM portion of the code specifies the two tables that should be joined.

7 SELECT column_name(s) FROM table1 INNER JOIN table2
ON table1.column_name = table2.column_name; The reserved words “INNER JOIN” should appear in between the two table names.

8 SELECT column_name(s) FROM table1 INNER JOIN table2
ON table1.column_name = table2.column_name; This is followed by the ON statement which specifies one field from each table which must be equal before the content of the tables are joined together.

9 Inner Join Inner join requires that the two variables in two different tables would have exactly the same values. This means that inner join will select the intersection of the two tables. This result in a table that is smaller or same size as the two starting tables. Inner join does not lead to an increase in the table size.

10 Inner Join Dx Codes Table Code ID Code Description 1 410.05
Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021 For example, consider the two tables in this slide, one containing description of diagnosis codes and another reports of encounters that refer to diagnoses. The description table includes text describing the nature of the diagnosis. The encounter table includes no text and just IDs and codes that can be used to connect to the description table. A join can select the text from the “Dx Codes” table and combine it with the data in the encounter table. An inner join will lead to listing of all claims in which the diagnostic code has a corresponding text in diagnosis table.

11 Inner Join Dx Codes Table Code ID Code Description 1 410.05
Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source SELECT c.*, d.* FROM [Dx Codes] d inner join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] This is an example of the code that can join these two tables. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

12 Alias Dx Codes Table Code ID Code Description 1 410.05
Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Alias Alias SELECT d.*, e.* FROM [Dx Codes] d inner join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Since table names are often long, to reduce the need to repeat the name of the table as a prefix for each field, one can also introduce aliases in join statements. In this statement, letters d and e are two aliases for the Diagnosis Codes and Encounters tables. Encounters Table Patient ID Provider ID Diagnosis ID Treatment ID Date 1001 12 1 1/12/2020 123 240 5 2 8/13/2012 150 2555 6 9/12/2021

13 Inner Join Dx Codes Table Code ID Code Description 1 410.05
Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source SELECT d.*, e.* FROM [Dx Codes] d inner join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Joining the [Dx Codes] and [Encounters] tables will allow us to see a description for each diagnosis. For example, for patient 1001, we read from the encounters table that the diagnosis ID is 1. Then from the Diagnosis Codes table we read that the corresponding description is Acute Myocardial Infarction. Diagnosis ID 1 appears in both tables. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

14 Inner Join Dx Codes Table Code ID Code Description 1 410.05
Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source No Matching 6 SELECT d.*, e.* FROM [Dx Codes] d inner join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] The situation is not the same for diagnosis id 6. There is no “Diagnosis ID” 6 in the [Dx Codes] table. So the encounter row will be included in the combined table. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021 6

15 Check Total Rows in Combined & Component Tables
Inner Join SELECT d.*, e.* FROM [Dx Codes] d inner join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Check Total Rows in Combined & Component Tables No Match Entire Record Gone Since the description of the diagnosis code is missing, all corresponding claims will also be deleted. Of course this does not make sense. A whole lot of data can be deleted because the diagnosis has no description. Imagine what will happen if we are trying to send a bill for the encounter. To generate the bill we need the description of the diagnosis. We will not have the description of the diagnosis in the combined table. Even worse, the entire record of the visit is gone. We won’t even know that the patient has had a visit. Poof, no description, no data, no bill. Whenever inner joins are used, the analyst must be careful not to inadvertently delete data. Always check the total number of records in the combined table against the records in the component tables. POOF!

16 All of Left Table Records Listed
Left/Right Join Cross Join All of Left Table Records Listed The left and right joins allow the field in one table to be always included and the fields from the other table included only when the IDs match. When the two IDs do not match, the record is still kept but there will be a null value in place of the missing record.

17 All of Right Table Records Listed
Left/Right Join Cross Join All of Right Table Records Listed If the right join is used, then all of the records in the right table are included. Where the record has a match in the left table then that content is included and when the record does not match then a null value is included.

18 Left/Right Join Cross Join
Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d right join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Following with the previous example, in right join, we can display all claims from [Encounters] table and their corresponding text from [Dx Codes] table. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

19 Left/Right Join Cross Join
Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d right join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] All of the encounters table records are included. For diagnosis 1 and 5 the description is included from [Dx Codes] table. For the record 6 a null value is included for description and for code. All claims data are still there but the description of the diagnosis is null when the description is not available. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

20 Left/Right Join Cross Join
SELECT d.*, e.* FROM [Dx Codes] d right join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Cross Join Combined Table From Encounters Table From Dx Codes Table Patient ID Provider ID Diagnosis ID Date Code Description 1001 12 1 1/12/2020 410.05 Acute myocardial infarction of anterolateral wall 123 240 5 250 Diabetes mellitus w/out mention of complication 150 2555 6 9/12/2021 Null Right join will lead to listing of all the records in encounter table. Note that diagnosis id 6 is listed even though with description left null.

21 Left/Right Join Cross Join
Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d left join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] In the left join, all records from [Dx Codes] table are included. Diagnoses that do not have an encounter are also included, with the missing encounters having null values. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

22 Left/Right Join Cross Join
Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d left join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] The combined table will list all 7 diagnoses. For diagnoses that have encounters, the encounters are listed and for diagnoses that do not have encounters null values are listed. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

23 Left/Right Join Cross Join
SELECT d.*, e.* FROM [Dx Codes] d left join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Cross Join Combined Table From Encounters Table From Dx Codes Table Patient ID Provider ID Diagnosis ID Date Code Description 1001 12 1 1/12/2020 410.05 Acute myocardial infarction of anterolateral wall Null 250 Diabetes mellitus without mention of complication 250.01 Acute MI of anterolateral wall 123 240 5 Diabetes mellitus w/out mention of complication 410.09 Acute myocardial infarction of unspecified source The combined table now has 7 rows, in 4 rows the encounter table is left as null.

24 All Records Listed All Records Listed
Full Join Cross Join All Records Listed All Records Listed If the full join is used, then all of the records in both tables are included. Where both tables match, the information is listed and when one table is missing a match then null values are inserted. Full join includes all the records in both left and right joins.

25 Cross Join Full Join Dx Codes Table Code ID Code Description 1 410.05
Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d full join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] For code ID 1 and 5, the encounter of patient 1001 and patient 123 are listed. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

26 Cross Join Full Join Dx Codes Table Code ID Code Description 1 410.05
Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d full join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Code IDs 2, 3, 4, and 7 are included but no encounter information is listed for these codes. Null values are provided. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

27 Cross Join Full Join Dx Codes Table Code ID Code Description 1 410.05
Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d full join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] For diagnosis ID 6, the encounter information is listed but the description is left null. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

28 Full Join SELECT d.*, e.* FROM [Dx Codes] d full join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Cross Join Combined Table From Encounters Table From Dx Codes Table Patient ID Provider ID Diagnosis ID Date Code Description 1001 12 1 1/12/2020 410.05 Acute myocardial infarction of anterolateral wall Null 250 Diabetes mellitus without mention of complication 250.01 Acute MI of anterolateral wall 123 240 5 Diabetes mellitus w/out mention of complication 150 2555 6 9/12/2021 410.09 Acute myocardial infarction of unspecified source Now the combined table includes null values in both descriptions and encounters.

29 All Possible Combinations
Cross Join Cross Join All Possible Combinations Without any Restrictions In cross join all records of one table are repeated for each record of the other table.

30 Cross Join Cross Join SELECT d.*, e.* FROM [Dx Codes] d cross join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Note that a cross join does not specify that any fields should match across the two tables.

31 Cross Join SELECT d.*, e.* FROM [Dx Codes] d cross join [Encounter] e Cross Join Combined Table for 1st Record of Encounter Table From Encounters Table From Dx Codes Table Patient ID Provider ID Diagnosis ID Date Code Description 1001 12 1 1/12/2020 410.05 Acute myocardial infarction of anterolateral wall 1/13/2020 250 Diabetes mellitus without mention of complication 1/14/2020 250.01 1/15/2020 Acute MI of anterolateral wall 1/16/2020 Diabetes mellitus w/out mention of complication 1/18/2020 410.09 Acute myocardial infarction of unspecified source The combined table for just the first record of the encounter table will include all 6 descriptions.

32 Cross Join SELECT d.*, e.* FROM [Dx Codes] d cross join [Encounter] e Cross Join Combined Table for 2nd Encounter From Encounters Table From Dx Codes Table Patient ID Provider ID Diagnosis ID Date Code Description 123 240 5 8/13/2012 410.05 Acute myocardial infarction of anterolateral wall 8/14/2012 250 Diabetes mellitus without mention of complication 8/15/2012 250.01 8/16/2012 Acute MI of anterolateral wall 8/17/2012 Diabetes mellitus w/out mention of complication 8/19/2012 410.09 Acute myocardial infarction of unspecified source The combined table for the second record of the encounter table will also include all 6 descriptions.

33 Cross Join SELECT d.*, e.* FROM [Dx Codes] d cross join [Encounter] e Cross Join Combined Table From Encounters Table From Dx Codes Table Patient ID Provider ID Diagnosis ID Date Code Description 123 240 6 8/20/2012 410.05 Acute myocardial infarction of anterolateral wall 8/21/2012 250 Diabetes mellitus without mention of complication 8/22/2012 250.01 8/23/2012 Acute MI of anterolateral wall 8/24/2012 Diabetes mellitus w/out mention of complication 8/25/2012 251 The combined table for the 3rd encounter will also include 6 records, each having a different description

34 All Possible Combinations
Cross Join Cross Join All Possible Combinations Lots of Data Cross join increases the data size considerably. In our example of 3 encounters and 6 descriptions, cross join created a combined table of 3 times 6 or 18 records. In massive data, you will never see cross joins. It would be computationally foolish. In smaller data, one might do a cross join but aggressively reduce some combinations using WHERE command.

35 Join command Connects two or More tables
This presentation was about how the join command connects multiple tables


Download ppt "Types of Joins Farrokh Alemi, Ph.D."

Similar presentations


Ads by Google