Join Implementation How is it done? Copyright © 2003-2019 Curt Hill.

Introduction We should have seen the join from relational algebra
We now consider how the join works when using either a BTree or Hash index Copyright © Curt Hill

Ways to Think of a Join We have usually considered the Join as a sequence of algebra operations Cartesian product Selection Optional project This is not always the best way, especially from an implementation perspective The best alternative is the zipper view Copyright © Curt Hill

The Zipper Approach Consider two files
Faculty The key is naid Schedule The key is dept, number, section Another candidate key is naid, time The join fields are naid If both files are sorted on the join field the join resembles a Match-Merge Copyright © Curt Hill

Sorted The default index for a table is usually a BTree
With a BTree the leaves are in the primary key’s sorted order If the primary key is not what is being looked at then either the table may be sorted or a secondary key may be used Now consider the match merge Copyright © Curt Hill

Match Merge The match merge is the means of updating a sorted master file with sorted transactions Of course, both sorted on the same kind of key This was well understood since the 1950s or before The action to perform is based on the relationship of the master to transaction keys Copyright © Curt Hill

Actions Read in one item from both, then do the following until done:
Transaction = Master Update the master Get new transaction Master < Transaction Write old master Read a new master Transaction < Master Declare an error Read new transaction Copyright © Curt Hill

Revisited The idea of the match merge is to make one pass through a master file and transaction file to do an update Contrast with Cartesian Product This only works if both are sorted and by the same key The same thing will work in database if both tables have an index for the joined field We will consider the SQL for creating indices later Copyright © Curt Hill

Zipper Join Picture Faculty Schedule 1024 a 1024 r 1024 s 1024 t
1092 v 1092 b 1092 w 1233 c 1279 x 1279 d 1279 y 1279 z Copyright © Curt Hill

Inners and Outers The last picture suggests two types of joins: inner and outer What we have considered so far is the inner join Only things that match on key are worth considering However, those things in either relation that match nothing in the other are also interesting This is the outer joins Copyright © Curt Hill

Continuing If both files are sorted on the join the previously mentioned zipper join is the best one to use However, if the join field is not the primary key sorting the relation on this field it may be expensive if Especially so if the outer join is larger than an inner join The number of joined records is small compared to either relation size Copyright © Curt Hill

Hash Join Recall that a Cartesian Product makes all possible combinations of records from two relations This could mean reading all of the blocks multiple times That is exactly what we want to avoid Hash join partitions two relations into pieces based on a hash function Then only joins partitions that reacted similarly to the hash function Of course, only works on Equi-Joins Copyright © Curt Hill

Process Hash the smaller of the two files on the join field
Read in the other file Hash each key into a bucket The only candidates for equality are here Produce the output Smaller but still substantial Copyright © Curt Hill

Join Implementation How is it done? Copyright © 2003-2019 Curt Hill.

Similar presentations

Presentation on theme: "Join Implementation How is it done? Copyright © 2003-2019 Curt Hill."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Join Implementation How is it done? Copyright © 2003-2019 Curt Hill.

Similar presentations

Presentation on theme: "Join Implementation How is it done? Copyright © 2003-2019 Curt Hill."— Presentation transcript:

Similar presentations

About project

Feedback