Presentation is loading. Please wait.

Presentation is loading. Please wait.

SQL Unit 3 Joins Kirk Scott 1. 3.1 Qualified Field Names and Table Aliases 3.2 Joining Two Tables 3.3 Three-Way Joins and Joining a Table with Itself.

Similar presentations


Presentation on theme: "SQL Unit 3 Joins Kirk Scott 1. 3.1 Qualified Field Names and Table Aliases 3.2 Joining Two Tables 3.3 Three-Way Joins and Joining a Table with Itself."— Presentation transcript:

1 SQL Unit 3 Joins Kirk Scott 1

2 3.1 Qualified Field Names and Table Aliases 3.2 Joining Two Tables 3.3 Three-Way Joins and Joining a Table with Itself 3.4 Outer Joins 2

3 3.1 Qualified Field Names and Table Aliases Join queries are queries that contain more than one table. They are the main topic of this unit and they will be illustrated shortly. 3

4 Qualified Field Names Before that, it’s necessary to explain the use of qualified field names, because qualified field names are used in join queries. In a join query, two tables can contain fields with the same name. Qualified field names are used to tell the two fields apart. 4

5 The Customer and Salesperson tables both contain a name field. Without worrying about the overall structure of such a query yet, imagine you have something along these lines: SELECT name FROM Customer, Salesperson 5

6 This is ambiguous. It isn't clear whether the SELECT refers to the name field in Customer or the name field in Salesperson. The solution is to qualify the field name with the table name using dot notation. 6

7 The following query clearly specifies that the name field of interest comes from the Customer table: SELECT Customer.name FROM Customer, Salesperson 7

8 Tables Aliases At the end of the last chapter the keyword AS was introduced for column aliases. Column aliases simply determine how output looks. The keyword AS can also be used to provide an alternative name for a table in a query. 8

9 Table aliases are useful in join queries. In a long query with long table names, you can use table aliases to reduce the amount of typing. More importantly, a table can be joined with itself. In that case, it has to be given two aliases, so that you can refer to the table in two different ways. 9

10 The following example illustrates giving the Customer table the alias C in the FROM clause. Anywhere else in the query where you might use the table's name (as a qualifier, for example) you can use C instead of the full name, Customer. For example: SELECT C.name FROM Customer AS C 10

11 Notice that you can use the alias in the SELECT statement, before the alias is defined. It is also possible to qualify field names in other places in a query. For example: SELECT C.name FROM Customer AS C WHERE C.state = 'AK' 11

12 Neither the alias nor the qualified field names are needed in order for this query to work. The example merely illustrates what is possible. These techniques will be put to practical use in the examples of join queries that will be covered next. 12

13 3.2 Joining Two Tables 13

14 3.2.1 What is a Join? The process of designing a database requires that different kinds of entities be stored in different tables. The relationship between entities is captured by a primary to foreign key relationship. Join queries are queries on more than one table, where the data values for related records can be retrieved and shown in a single row in the results. 14

15 Think back to the discussion of incorrect table designs. One of the incorrect designs looked like this: motherA………child1… motherA………child2… motherB………child3… 15

16 Although this is not a correct design for a table, it is quite possible that this is the kind of information that someone would like to retrieve from a database, and the way in which they would like to see it. A join query makes it possible to generate exactly these kinds of results. 16

17 3.2.2 Plain Join Syntax Here is an example of a simple join query. It is shown with line numbers so that the lines can be referred to in the explanation that follows: (1) SELECT * (2) FROM Salesperson, Carsale (3) WHERE Salesperson.spno = Carsale.spno The characteristics of the query can be explained most easily in the order (2), (1), (3): 17

18 (2) The query is based on two tables. The names of the tables are listed in the FROM clause, separated by a comma. In order to write a join query that makes sense, the two tables have to have a field in common. Typically the tables are in a primary key to foreign key relationship. 18

19 In this example there is a 1-m relationship between Salesperson and Carsale. The field they have in common is spno. It is the primary key of the Salesperson table and it's embedded as a foreign key in the Carsale table. 19

20 (1) SELECT *, as usual, means to select all of the fields. In this case there are two tables in the query, so this will select all of the fields from both tables. By default, the fields of the Salesperson table will appear first, in the order in which they're defined in the table, followed by the fields of the Carsale table, in the order in which they're defined in that table. Because * is used, the spno field will appear twice in the results of the query, once in the list of fields from each table. 20

21 (3) The most critical part of the query is the joining condition. It is similar to the kind of condition found in a WHERE clause, but in this case two field names appear in the condition, not a field name and a value. Because the names of the corresponding fields in the two tables are the same, the field names have to be qualified with the table names. 21

22 In this example, a test of equality is used, and this kind of query can be referred to as an equijoin query. The joining condition forms a match-up between the related fields of the two tables being joined. The condition specifies that there should be a row in the results for each pair of records in the two tables where there is a match in the values of the joining fields. 22

23 3.2.3 Unmatched Fields in Joins Referential integrity states that no foreign key value should exist without a corresponding primary key value. However, a primary key value can exist without a corresponding foreign key value. Also, a foreign key field may contain nulls. 23

24 A join query will find matches, but any records that don't have matches will not appear in the results. A primary key record without a matching foreign key record will not appear. A foreign key record with null in the foreign key field will not appear. 24

25 3.2.4 The Cartesian Product It's worth mentioning the Cartesian product as an example of a mistake that's easy to make when writing a join query. Taking a look at a Cartesian product also helps emphasize what a join query is. Consider this example: SELECT * FROM Salesperson, Carsale 25

26 Syntactically, the example doesn't contain a mistake. It will run. It will find the Cartesian product, or cross product, of the two tables, Salesperson and Carsale. 26

27 This query doesn't contain a joining condition. The query will produce as results every row in the one table matched with every row in the other. It doesn't depend on the matching of values of corresponding fields. 27

28 If the Salesperson table had 10 records and the Carsale table had 100 records, this query would produce 10x100, or 1000 records in the results. It is very unlikely that the results of such a query would have any value. On the other hand, when you get 1000 records in the results, it's not hard to figure out what happened: You forgot the joining condition in the query you were trying to write. 28

29 3.2.5 The Inner Join The kind of join under discussion here can be called an inner join. This is the technical term for a query with a joining condition where records without matches are not included in the results. It turns out that it's possible to write a query where some records without matches are included in the results. 29

30 Such a query is known as an outer join. It will be explained shortly. There is a special syntax which can be used in order to form inner joins. It will be covered next. 30

31 The original example query above could be written in this way: SELECT * FROM Salesperson INNER JOIN Carsale ON Salesperson.spno = Carsale.spno 31

32 The following query produces exactly the same results: SELECT * FROM Carsale INNER JOIN Salesperson ON Carsale.spno = Salesperson.spno 32

33 The results of either of these queries will be the same as the results of the example using the plain syntax. However, there are two things to notice about the inner join syntax: A. It is impossible to mistakenly form a Cartesian product using this syntax. If you forget the ON clause, you will get an error message. B. This syntax parallels the outer join syntax which will be presented later. 33

34 3.2.6 Select, Project, Join (SPJ) Queries In general, anything that you can do in a simple query you can also do in a join query. The fact that a full-scale query may include selection, projection, and joining is captured in the expression "SPJ query". All of the join query examples up to this point have done SELECT *. 34

35 There is also a term, natural join, which signifies selecting all of the fields of the respective tables, but without duplicating the field they have in common. This is an example of a natural join: 35

36 SELECT Salesperson.spno, name, addr, city, state, phone, bossno, commrate, vin, custno, date, salesprice FROM Salesperson, Carsale WHERE Salesperson.spno = Carsale.spno 36

37 The spno field has to be qualified in the SELECT, just as it is in the WHERE clause. It doesn't matter whether you use Salesperson.spno or Carsale.spno in the SELECT, because the values will be the same in matching records. If there is no duplication among the names of the other fields of the two tables, it is not necessary to qualify them. They can simply be listed in the SELECT statement in the order desired, separated by commas. 37

38 Here is a simple example of a join query that includes both selection and projection: SELECT Salesperson.name, vin, salesprice FROM Salesperson, Carsale WHERE Salesperson.spno = Carsale.spno AND date > #5/5/2005# 38

39 The most common mistake to make when doing this, again, would be to forget the joining condition. Notice that the joining condition and the additional condition are connected with an AND. Also notice that, as usual, the field(s) involved in the additional WHERE clause conditions do not have to be selected for appearance in the results themselves. 39

40 3.2.7 Join Queries with Other Features Join queries are not limited to selection and projection. Essentially, anything that can be included in a simple query can be included in a join query. For example, you may simply want to count the number of rows that a join query would produce: SELECT COUNT(*) FROM Salesperson, Carsale WHERE Salesperson.spno = Carsale.spno 40

41 Recall that you can also count on a single field, and the count doesn't include rows where the field is null. The following query would count the results of the join but only for those salespeople who had a boss: SELECT COUNT(bossno) FROM Salesperson, Carsale WHERE Salesperson.spno = Carsale.spno 41

42 Incidentally, you could count the number of rows in the result of the Cartesian product: SELECT COUNT(*) FROM Salesperson, Carsale 42

43 You could also order the results of a join query: SELECT name, city, state, vin, salesprice FROM Salesperson, Carsale WHERE Salesperson.spno = Carsale.spno ORDER BY state, city 43

44 3.2.8 Inner Join Syntax with Other Features There are no restrictions on the inner join syntax if you choose to use it. You can do SPJ queries by picking a subset of fields in the SELECT and including a WHERE clause. You can also count or order the results. 44

45 3.2.9 Theta Joins The phrase "theta join" refers to the fact that it is syntactically possible to write join queries with any of the inequality operators as well as the equality operator. The term "theta" is used to refer to whichever operator is chosen. 45

46 It is difficult to come up with an example where this is useful, but it's not hard to illustrate the syntax: SELECT * FROM Salesperson, Carsale WHERE Salesperson.spno > Carsale.spno 46

47 The results of this query would match records of salespeople from the Salesperson table with all of the carsale records where the salesperson number of the carsale record was smaller than the salesperson number of the Salesperson table. In general, you would expect the number of records in the results of such a query to be somewhere between the number of records in an equijoin and the number of records in the Cartesian product. 47

48 3.3 Three-Way Joins and Joining a Table with Itself 48

49 3.3.1 Three-Way Joins Although in any given system there would be some absurdly large upper limit, in practice there is no limit on the number of tables that can be included in a join query, assuming that all of the tables exist in the database and are in some way related to each other. The largest examples given here will involve three tables, and if you can manage a three-way join, you understand everything there is to understand about a join containing more tables. 49

50 In the example database there is a 1-m relationship between the Salesperson and Carsale tables. There is also a 1-m relationship between the Customer and Carsale tables. When taking those three tables together, there is an m-n relationship between salespeople and customers, and Carsale is the table in the middle. 50

51 Here is an example of a join query involving all three tables: SELECT Salesperson.name, vin, Customer.name FROM Salesperson, Carsale, Customer WHERE Salesperson.spno = Carsale.spno AND Carsale.custno = Customer.custno 51

52 The Salesperson and Customer tables have some field names in common. Notice in the SELECT that it's necessary to use qualified names to distinguish them. The tables in the query are listed after FROM, separated by commas. 52

53 Because there are three tables, there are two joining conditions, one for each way of pairing the tables. The two joining conditions are connected with AND, and because the respective primary key and foreign key fields in the corresponding tables have the same names, the field names have to be qualified. It is possible to include additional conditions, count, order by, etc. in a multi-way join query. 53

54 3.3.2 Three-Way Inner Joins Whatever you can do with plain join syntax, you can also do with inner join syntax. However, the inner join syntax includes some details which may merit some explanation. 54

55 Here is one example of an inner join query which is equivalent to the previous example: SELECT Salesperson.name, vin, Customer.name FROM Salesperson INNER JOIN (Carsale INNER JOIN Customer ON Carsale.custno = Customer.custno) ON Salesperson.spno = Carsale.spno 55

56 As you can see, there are effectively two inner join queries, and they are nested using parentheses. Inside the parentheses the Carsale and Customer tables are joined on custno, and outside of the parentheses, the Salesperson table is joined with the result of the query inside on spno. 56

57 The same results can be achieved by nesting in the other order: SELECT Salesperson.name, vin, Customer.name FROM Customer INNER JOIN (Carsale INNER JOIN Salesperson ON Carsale.spno = Salesperson.spno) ON Customer.custno = Carsale.custno 57

58 This is the first occurrence of nesting encountered so far. It will be a recurring theme as more complicated queries are developed. In general, when things are nested, whatever is inside will be executed before whatever is outside, and this holds true for inner joins. The two queries produce the same results, but the processing occurs in a different order for each. 58

59 3.3.3 Inner Join Query Nesting and Processing Performance Because the processing of the two inner join queries above is done in a different order, in a system like Access it is possible that the efficiency or speed of the queries will differ. If you were using Access in production, this might be important to you. You could test which alternative seemed faster and use that one. 59

60 You might also research the question further so that when writing queries you could predict in advance which alternative was preferable. On the other hand, with small tables, any difference is likely to be imperceptible. Under simple conditions the average user doesn’t care what the processing order of a query is. 60

61 In a database management system more advanced than Access, there will be a query optimizer. This internal code will determine the best order in which to execute the elements of a query, and the order of the nesting in the query becomes a moot point. 61

62 There is a certain irony in this. Access is a comparatively unsophisticated system. You might assume that the typical Access user is comparatively unsophisticated. However, it is under this scenario that the user has to have special knowledge in order to get good system performance. 62

63 In a more advanced system, the system takes care of optimization anyway. Either way, the plain join syntax may be preferable to inner join syntax. For unsophisticated users, it's simpler. On sophisticated systems, its performance will be just as good anyway. 63

64 3.3.4 Joining a Table with Itself The Salesperson table is an example of a table that is in a 1-m relationship with itself. The primary key field, spno, is embedded as a foreign key field, bossno, in the same table. The meaning of the bossno field is this: Given a salesperson, look up the bossno in that person's record. This bossno value is the spno of their boss. 64

65 Then it would be possible to look up the boss's record using that value. Since the table is in a relationship with itself, that means that it can be joined with itself. In other words, it would be possible to write a query where a record in the results consisted of boss information followed by information about a salesperson who has that boss. 65

66 Joining a table with itself only requires one bit of syntax that hasn't been present in the previous examples. The same table will appear twice in the same query, so it becomes necessary to have aliases for those occurrences of the table. 66

67 Here is an example query: SELECT Boss.spno, Boss.name, Employee.spno, Employee.name FROM Salesperson AS Boss, Salesperson AS Employee WHERE Boss.spno = Employee.bossno 67

68 The thing to keep straight is which opening of the table you get the spno and bossno fields from when forming the joining condition. The aliases chosen in this example are intended to be descriptive of the roles the openings of the tables play in the query. Quite often, simpler aliases will be chosen. If that is the case, then it is the use of the aliases in the joining condition, namely the fields from the aliases that are used in the joining condition, that determine the role of the table in the query. 68

69 The following example is logically analogous to the previous one: SELECT A.spno, A.name, B.spno, B.name FROM Salesperson AS A, Salesperson AS B WHERE A.bossno = B.spno 69

70 In the joining condition of this query, bossno is pulled from alias A. This tells you that A is playing the role of the employee. spno is pulled from alias B and this tells you that B is playing the role of boss. In the select, you see that the employee information is shown first, followed by the boss information. 70

71 When you write a query like this one, the column headings in the results will be the qualified names given in the SELECT statement. In the example where the aliases were Boss and Employee, this is reasonably descriptive. In this most recent example, the column headings wouldn't be descriptive. 71

72 Therefore, the query could be made more user friendly by also including column aliases. For example: SELECT A.spno AS [Employee spno], A.name AS [Employee name], B.spno AS [Boss spno], B.spno AS [Boss name] FROM Salesperson AS A, Salesperson AS B WHERE A.bossno = B.spno 72

73 3.4 Outer Joins 73

74 3.4.1 What is an Outer Join? An outer join is a special kind of join where records from one of the tables being joined will be included in the results even if they don't have matches. This may sound somewhat obscure, but the explanation for it is simple enough. Take the Car and Carsale tables for example. 74

75 The Car table contains all of the basic information about cars. The Carsale table contains the information that relates specifically to those cars that did sell. The relationship is captured by a primary key to foreign key pair, the vin field in the Car and Carsale table, respectively. The additional assumptions about the design state that this is in fact a 1-1 relationship. 75

76 The logic of the inner join, as applied to cars and carsales, would go like this. Correct database design separated cars and their sales into different tables. People would like to see results where a single row contains data from a car along with data from the sale of that car, if it did sell. 76

77 Now consider this scenario: People would like to see a list of all cars—and for those cars that sold, they would also like to see the carsale data —but if a car didn't sell, they would still like to see the car in the list. In a sense, you could think of the desired results as a "super" car table. The catch is that you can't do this with an inner join. 77

78 By definition, cars that didn't sell will not have matches in the Carsale table, so they will not be selected by an inner join query. An outer join allows you to generate results like this. As with all queries, the results are purely tabular, so they aren't necessarily pretty. 78

79 Systems like Access have additional features that make it possible to generate forms and reports. These features are based on query results, and if the report was supposed to include records from one table which didn’t have matches in the other, the report would be based on an outer join query. 79

80 3.4.2 The Keywords LEFT and RIGHT in Outer Joins Because an outer join affects one table differently than the other, the keywords LEFT and RIGHT appear in the syntax. These keywords correspond to which table appears first in the query, reading from left to right, and which appears second. 80

81 If the query is written using the keyword LEFT, the table that appears first in the query, that is, the one that appears on the left in the query, is the one where rows will be retrieved without matches. If the query is written using the keyword RIGHT, the table that appears second in the query, that is, the one that appears on the right in the query, is the one where rows will be retrieved without matches. 81

82 The syntax for the outer join is similar to the syntax for the inner join. In place of INNER JOIN it is possible to write the phrases LEFT OUTER JOIN or RIGHT OUTER JOIN, but it is simpler to write LEFT JOIN or RIGHT JOIN, and this abbreviated syntax will be used in these notes. 82

83 Here is an example: SELECT * FROM Car LEFT JOIN Carsale ON Car.vin = Carsale.vin 83

84 In this example query, all car records will be retrieved. For those with matches, the records in the results will look just the like records in the results of an inner join. For those records in the Car table that do not have matches in the Carsale table, the records in the results will consist of all of the car data from the Car table followed by nulls for all of the fields from the Carsale table. 84

85 The above query could be rewritten as shown below. The table on the right side in the FROM clause, again Car, is the one where records will be retrieved without matches. The only difference is that in the results of the query the carsale fields will come first and the car fields will come second: SELECT * FROM Carsale RIGHT JOIN Car ON Carsale.vin = Car.vin 85

86 If you are selecting specific fields anyway, you can put them in the SELECT in whatever order you want, and then you can write the rest of the query as a left or right join, whichever you prefer. Here is another example. It finds the names of all salespeople whether they sold a car or not. The outer join is on the primary key table in the 1-m relationship. 86

87 You are finding two things: Primary key records that do have a match in the foreign key table Plus primary key records that don't have a match in the foreign key table: SELECT name, salesprice FROM Salesperson LEFT JOIN Carsale ON Salesperson.spno = Carsale.spno 87

88 If you wrote the query as shown below, it would find the salesprices of all carsales, including carsales which did not have a salesperson recorded for them, assuming that there were any such carsales. The table in the outer join where records are included whether they have a match or not is the foreign key table in the 1-m relationship. 88

89 You are finding two things: Foreign key records that do have a match in the primary key table Plus foreign key records that don't have a match in the foreign key table—and therefore, by referential integrity, are null: SELECT name, salesprice FROM Salesperson RIGHT JOIN Carsale ON Salesperson.spno = Carsale.spno 89

90 3.4.3 The Full Outer Join Some implementations of SQL include syntax for a full outer join. This would be a join that retrieved all of the matches between two tables and would also retrieve all of the records from both of the tables which didn't have matches. Access doesn't support this syntax. 90

91 Access does support a UNION keyword, which can be used to accomplish this. UNION will be covered in detail in a future unit, but it is introduced here for two reasons: It makes it possible to do the full outer join, and because it helps clarify what a join is. 91

92 A join can be thought of as a horizontal combination of two tables. A new result table is formed where each row contains information from one row of table A and one row of table B. The joining condition tells how the rows from tables A and B are matched for inclusion in a result row: 92

93 JOIN 93

94 A union can be thought of as a vertical combination of two tables. These two tables have to have the same set of fields. The result of the union is a new table which contains all of the rows from table A and all of the rows from table B. If tables A and B happen to contain rows which are the same, such a row will only appear once in the results: 94

95 UNION 95

96 The following example shows a full outer join on Salesperson and Carsale which would include all fields of both tables. If it did projection and only picked a subset of the fields, it would be critical that the same subset was picked in the same order in both the left and the right outer joins: 96

97 SELECT * FROM Salesperson LEFT JOIN Carsale ON Salesperson.spno = Carsale.spno UNION SELECT * FROM Salesperson RIGHT JOIN Carsale ON Salesperson.spno = Carsale.spno 97

98 3.4.4 Multi-Way Outer Joins Just like it's possible to use nested inner join queries to do multi-way joins, it is possible to have multi-way queries involving outer joins. Depending on the system you try to do this on, there may be restrictions on whether an outer join can be nested inside an inner join, or vice-versa. 98

99 Such syntactical details are unpleasant. Since it's not frequently necessary to do nested outer joins, from a practical point of view they are also relatively unimportant, so this topic will not be pursued. 99

100 3.4.5 One Last Example This last example results from a question a student asked once. It boiled down to, "Would the system really do this?" The query itself is of no practical value, but it's useful because it illustrates again exactly how an outer join works. 100

101 Here is the example query: SELECT Carsale.vin, salesprice FROM Car LEFT JOIN Carsale ON Car.vin = Carsale.vin ORDER BY Car.vin 101

102 The critical and impractical aspect of the query is that it only selects fields from the table on the right side of the join. However, it is a left join query, which means that it will include records from the left table that have no matches in the right table. 102

103 The rule is that for such records, all fields from the right table will be null. In order to help clarify what the results would be, the query also orders by the vin in the Car table, the left table in the join. The results are shown below. 103

104 Query1 vinsalesprice 11111$4,500.00 12345$7,500.00 22222$18,000.00 44444$9,250.00 55555$18,000.00 88888$15,000.00 99999$16,500.00 ggggg$18,000.00 104

105 For those cars that sold, the relationship with the Carsale table is 1-1. A car can only be sold once. However, not all cars have sold, so not all records in the Car table have matches in the Carsale table. 105

106 This query produces results with as many rows as the Car table has records. For those cars that sold, the vin and the salesprice are shown. For those cars that didn't sell, both fields in the results are blank—in other words, there is a completely blank row in the result. 106

107 This is because both of the selected fields come from the Carsale table. If you changed the query so that it selected Car.vin rather than Carsale.vin, the results would be more realistic. Each row in the results would have a vin of a car. 107

108 For those cars that sold, the salesprice would be shown. For those cars that hadn't been sold, the salesprice field would be blank. 108

109 The End 109


Download ppt "SQL Unit 3 Joins Kirk Scott 1. 3.1 Qualified Field Names and Table Aliases 3.2 Joining Two Tables 3.3 Three-Way Joins and Joining a Table with Itself."

Similar presentations


Ads by Google