Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Systems Ch Michael Symonds

Similar presentations


Presentation on theme: "Database Systems Ch Michael Symonds"— Presentation transcript:

1 Database Systems Ch. 15.3 Michael Symonds
Nested-Loop Joins Database Systems Ch. 15.3 Michael Symonds

2 What is a Nested-Loop Join?
Used as part of a family of algorithms for the join operator considered “one-and-a-half” passes since, with each variation, one of the two arguments has its tuples read only once, while the tuples of the other argument are read repeatedly. Nested-loop joins can be used for relations of any size, it is not necessary that even one relation fit entirely into main memory.

3 But first, a quick review…
What is a Join? R x S (Cartesian Product) Join (R x S) = 1 A B C 2 3 Let R = 1 2 3 Select * FROM R, S Let S = A B C

4 But first, a quick review…
Select * FROM R, S WHERE R.ID = S.ID What is a Natural Join? R  S Let R = ID Col 1 2 A2 5 A5 3 A3 1 - 4 A4 Let S = ID Col 2 5 B5 1 B1 3 - 6 B6 2 B2 C5 Natural Join (R  S) = ID Col 1 Col 2 2 A2 B2 5 A5 B5 C5 3 A3 - 1 B1

5 But first, a quick review…
Theta Join σc(RxS) ID Col 1 ID  Col 2 5 A5 B5 6 B6 C5 4 A4 3 A3 - 2 A2 B2 1 B1 Select * FROM R, S WHERE R.ID ≤ S.ID What is a Theta Join? R c S = σc(RS) Let R = ID Col 1 2 A2 5 A5 3 A3 1 - 4 A4 Let S = ID Col 2 5 B5 1 B1 3 - 6 B6 2 B2 C5

6 Tuple-Based Nested-Loop join – A basic algorithm: For a join of R(X,Y) ∞ S(Y,Z)
1 1 FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to make a tuple t THEN Output t; 2 2 3 3 4 4 5 5 6 6

7 Tuple-Based Nested-Loop join – A basic algorithm: For a join of R(X,Y) ∞ S(Y,Z)
1 FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to make a tuple t THEN Output t; 2 2 3 3 1 4 4 5 5 6 6

8 Tuple-Based Nested-Loop join – A basic algorithm: For a join of R(X,Y) ∞ S(Y,Z)
1 FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to make a tuple t THEN Output t; 2 2 3 3 1 4 4 5 5 6 6

9 Tuple-Based Nested-Loop join – A basic algorithm: For a join of R(X,Y) ∞ S(Y,Z)
FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to make a tuple t THEN Output t; 2 ? 3 1 4 4 5 5 6 6 1 3 1 2 1 1

10 Tuple-Based Nested-Loop join – A basic algorithm: For a join of R(X,Y) ∞ S(Y,Z)
1 FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to make a tuple t THEN Output t; 2 3 3 2 4 4 5 5 6 6

11 Many ways we can improve upon this:
With a naive implementation (ie. we read in 1 tuple at a time of each relation from disk) this would require: T(R)T(S) disk I/O’s Many ways we can improve upon this: Use an index on the join attribute(s) of R to find the tuples of R that match a given tuple of S. (discussed later in 15.6) Look much more carefully at the way tuples of R and S are divided among blocks and use this information to use as much available memory as possible to reduce the number of disk I/O’s as we go through the inner loop. (block- based Nested-Loop join (covered later here))

12 An Iterator for Tuple-Based Nested-Loop Join
Nested joins fit well into an iterator-framework. Allows us to avoid storing intermediate relations on disk in some situations Interleave query operations by feeding the tuples produced by one operations directly to the next operation that uses it see Pipelining: ch for more info 3 iterator methods for nested loop joins Assuming that neither relation R nor S is empty

13 3 iterator methods for nested loop joins

14 Block-Based Nested-Loop Join Algorithm:
Organizing access to both argument relations by blocks Makes sure that when we run through the tuples of R in the inner loop, we use as few disk I/O’s as possible to read R Using as much main memory as we can to store tuples belonging to the relation S, the relation of the outer loop. Enables us to join each tuple of R that we read with not just one tuple of S, but with as many tuples of S as will fit in memory

15 How does it work? > > Assume:
M is a block of memory B(S) and B(R) are blocks in memory holding tuples of S and R respectively Assume B(S) ≤ B(R) and also that B(S) > M. (ie neither relation fits into main memory) B(R) > B(S) > M repeatedly read M-1 blocks of s into main-memory buffers Create a search structure, with search key equal to the common attributes of R and S, for the tuples of S that are in main memory Go through all blocks of R, reading each one in turn, into the last block of memory Once there, we compare all the tuples of R’s block with all the tuples in all the blocks of S that are currently in main memory For those that join, we output the joined tuple.

16 Nested-Loop Join (or Nested-Block Join)
FOR each chunk of M-1 blocks of S, DO BEGIN Read these blocks into main memory buffers; Organize their tuples into a search structure whose search key is the common attributes of R and S; FOR each block b of R, DO BEGIN Read b into main memory; FOR each tuple t of b, DO BEGIN Find the tuples of S in main memory that join with t Output the join of t with each of these tuples; END;

17 As an example… Let B(R) = 1000, B(S) = 500, and M = 101 M = 101
Blocks of memory Lots – o – Blocks – o – tuples B(S) = 500 Blocks – o – tuples Secondary Storage or some location other than main memory buffers Main memory

18 As an example… Use 100 blocks (M - 1) of memory to buffer S in 100-block chunks The outer loop S will iterate 5 times B(R) = 1000 M = 101 (M – 1) = 100 1 B(S) = 500 1000 100 100 100 100 100 100 Secondary Storage or some location other than main memory buffers Main memory

19 Corresponding Algorithm section:
As an example… At each iteration we do 100 disk I/O’s to read the chunk of S B(R) = 1000 M = 101 (M – 1) = 100 1 B(S) = 500 1000 I/O x 100 (reading tuples of S into main memory) 100 100 100 100 Corresponding Algorithm section: FOR each chunk of M-1 blocks of S, DO BEGIN Read these blocks into main memory buffers; 100 100 Secondary Storage or some location other than main memory buffers Main memory

20 Corresponding Algorithm section:
As an example… We then read in R entirely inside the second loop, using 1000 disk I/O’s M = 101 (M – 1) = 100 B(R) = 1000 I/O x 1000 1 1 B(S) = 500 999 100 100 Corresponding Algorithm section: Organize their tuples into a search structure whose search key is the common attributes of R and S; FOR each block b of R, DO BEGIN Read b into main memory; FOR each tuple t of b, DO BEGIN 100 100 100 Secondary Storage or some location other than main memory buffers Main memory

21 Corresponding Algorithm section:
As an example… We then read in R entirely inside the second loop, using 1000 disk I/O’s B(R) = 1000 M = 101 (M – 1) = 100 ? I/O x 1000 1 1 90 B(S) = 500 1 82 999 100 1 45 1 18 100 1 3 100 100 Corresponding Algorithm section: Find the tuples of S in main memory that join with t Output the join of t with each of these tuples; 100 Secondary Storage or some location other than main memory buffers Main memory

22 The total number of disk I/O’s for the operation is:
( ) * 5 = 5,500 Note: If we reverse the order of reading in R and S, the algorithm takes slightly more disk I/O’s. We would iterate 10 times in the outer loop, and 500 times for each inner loop, making the total ( ) * 10 = 6,000 In general, there is always a sight advantage to keeping the smaller relation on the outer loop.

23 Analysis of Nested-Loop Join
B(S) Blocks – o – tuples (in storage) Assuming S is the smaller relation, the number of chunks or iterations of the outer loop is: B(S)/(M – 1) At each iteration, we read M – 1 blocks of S and B(R) blocks of R. The number of disk I/O’s is thus: B(S)(M – 1 + B(R))/(M – 1) or B(S) + (B(S)B(R))/(M – 1) Assuming all of M, B(S), and B(R) are large and M is the smallest of these, an approximation of the above formula is: B(S)B(R)/M M-1 Blocks (in main memory)

24 The cost is proportional to the product of the sizes of the two relations, divided by the amount of available main memory Note: we can do much better than a nested-loop join when both relations are large. But for reasonably small examples (covered above), the cost of nested loop join is not much greater than the cost of a one-pass join, which would have been 1500 in the above example) In fact, if B(S) ≤ M – 1, then the nested loop becomes identical to a one-pass algorithm. Although the nested loop join is not the most efficient join algorithm possible, we should note that in some early relational DBMS’s, it was the only method available. Even today, it’s needed as a subroutine in more efficient join algorithms in certain situations, like when large numbers of tuples share a common value for their join attribute(s).

25 Main Memory and Disk I/O Requirements
Note: The memory requirements for ϒ (group-by) and δ (unique) operators are actually more complex than shown, M = B is a loose approximation. For ϒ: M depends on the number of groups For δ: M depends on the number of distinct tuples


Download ppt "Database Systems Ch Michael Symonds"

Similar presentations


Ads by Google