Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Similar presentations


Presentation on theme: "Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics."— Presentation transcript:

1 Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan101@yahoo.com

2 Data Warehousing 2 Need for Speed: Join Techniques

3 Data Warehousing 3 Background

4 4 About Nested-Loop Join

5 Data Warehousing 5 FOR i = 1 to N DO BEGIN /* N rows in T1*/ IF i th row of T1 qualifies THEN BEGIN For j = 1 to M DO BEGIN /* M rows in T2*/ IF the i th row of T1 matches to j th row of T2 on join key THEN BEGIN IF the i th row of T1 matches to j th row of T2 on join key THEN BEGIN IF the j th row of T2 qualifies THEN BEGIN IF the j th row of T2 qualifies THEN BEGIN produce output row produce output row END END Nested-Loop Join: Code GOES TO GRAPHICS

6 Data Warehousing 6 “What is the average GPA of undergraduate male students?” For each qualifying row of Personal table, Academic table is examined for matching rows. Student Personal TableStudent Academic Table 298----------------- ---------------------- ---------------------- 62------------------ ---------------------- ---------------------- 440------------------ Nested-Loop Join: Working Example Results Search Results Search Results Search GOES TO GRAPHICS

7 Data Warehousing 7 Nested-Loop Join: Order of Tables

8 Data Warehousing 8 Nested-Loop Join: Cost Formula Join cost = Join cost = Cost of accessing Table_A + # of qualifying rows in Table_A  Blocks of Table_B to be scanned for each qualifying row OR Join cost = Join cost = Blocks accessed for Table_A + Blocks accessed for Table_A  Blocks accessed for Table_B GOES TO GRAPHICS

9 Data Warehousing 9 Nested-Loop Join: Cost of reorder Table_A = 500 blocks and Table_B = 700 blocks. Qualifying blocks for Table_A QB(A) = 50 Qualifying blocks for Table_B QB(B) = 100 Join cost A&B = 500 + 50  700 = 35,500 I/Os Join cost B&A = 700 + 100  500 = 50,700 I/Os i.e. an increase in I/O of about 43%. GOES TO GRAPHICS

10 Data Warehousing 10 Nested-Loop Join: Variants

11 Data Warehousing 11 Sort-Merge Join

12 Data Warehousing 12 Sort-Merge Join: Process

13 Data Warehousing 13 11222455566666781122245556666678 13344455666677771334445566667777 Table_A Table_B 11222455566666781122245556666678 13344455666677771334445566667777 11222455566666781122245556666678 13344455666677771334445566667777 Sort-Merge Join Example

14 Data Warehousing 14 Sort-Merge Join: Note

15 Data Warehousing 15 Hash-Based join

16 Data Warehousing 16 Hash-Based Join: Working

17 Data Warehousing 17 Hash-Based Join: Example Table_B on disk Disk Original Relation Table_A hash function h Join Result... Table_B M N N 2 1...... 1 2...... Table_A in main memory MAIN MEMORY GOES TO GRAPHICS

18 Data Warehousing 18 Hash-Based Join: Large “small” Table

19 Data Warehousing 19 Hash-Based Join: Partition Skew

20 Data Warehousing 20 Hash-Based Join: Intrinsic Skew


Download ppt "Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics."

Similar presentations


Ads by Google