Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Chapter 8 Introduction to Number Theory. 2 Contents Prime Numbers Fermats and Eulers Theorems.
한양대학교 정보보호 및 알고리즘 연구실 이재준 담당교수님 : 박희진 교수님
1
Technische Universität München + Hewlett Packard Laboratories Dynamic Workload Management for Very Large Data Warehouses Juggling Feathers and Bowling.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Part 3 Probabilistic Decision Models
Chapter 1 The Study of Body Function Image PowerPoint
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Objectives: Generate and describe sequences. Vocabulary:
UNITED NATIONS Shipment Details Report – January 2006.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
1 Discreteness and the Welfare Cost of Labour Supply Tax Distortions Keshab Bhattarai University of Hull and John Whalley Universities of Warwick and Western.
Chapter 7 Sampling and Sampling Distributions
Robust Window-based Multi-node Technology- Independent Logic Minimization Jeff L.Cobb Kanupriya Gulati Sunil P. Khatri Texas Instruments, Inc. Dept. of.
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
MIMO Broadcast Scheduling with Limited Feedback Student: ( ) Director: 2008/10/2 1 Communication Signal Processing Lab.
Pole Placement.
PP Test Review Sections 6-1 to 6-6
EU market situation for eggs and poultry Management Committee 20 October 2011.
Hash Tables.
Notes 18 ECE Microwave Engineering Multistage Transformers
Chi-Square and Analysis of Variance (ANOVA)
Bellwork Do the following problem on a ½ sheet of paper and turn in.
2 |SharePoint Saturday New York City
Green Eggs and Ham.
VOORBLAD.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Statistical Inferences Based on Two Samples
Analyzing Genes and Genomes
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Chapter 8 Estimation Understandable Statistics Ninth Edition
Intracellular Compartments and Transport
PSSA Preparation.
Experimental Design and Analysis of Variance
Essential Cell Biology
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Immunobiology: The Immune System in Health & Disease Sixth Edition
Simple Linear Regression Analysis
Energy Generation in Mitochondria and Chlorplasts
9. Two Functions of Two Random Variables
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
Evaluating Window Joins over Unbounded Streams Jaewoo Kang Jeffrey F. Naughton Stratis D. Viglas {jaewoo, naughton, Univ. of Wisconsin-Madison.
Presentation transcript:

Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter: Yang Ying-Chia (R ) CSIE, National Taiwan University

Outline Abstract Background Introduction Related Work Estimating the Cost of Moving Window Joins On Maximizing the Efficiency of Processing Joins Conclusion 2

Abstract – Problem and Solution Problem: Process joins over unbounded streams. Solution: Moving Window Join Queries have window predicates 3

Abstract – Central Point of the Thesis The paper proposes a unit-time-basis cost model for evaluating moving window joins. Using this cost model, it proposes strategies for maximizing the efficiency of processing joins in different scenarios. 4

Abstract Background Introduction Related Work Estimating the Cost of Moving Window Joins On Maximizing the Efficiency of Processing Joins Conclusion 5

Background Join Nested Loops Join (NLJ) Hash Join (HJ) Moving Window Join 6

Background – JoinJoin 7

Background – Nested Loops Join (NLJ) 8

Background – Hash Join (HJ) 9

Background – Moving Window Join 10

Background – Moving Window Join Instead of saying we want to join all tuples of A and B, we say we want to join all tuples that have arrived on A in the last t1 seconds with all the tuples that have arrived on S in the last t2 seconds. 11

Abstract Background Introduction Related Work Estimating the Cost of Moving Window Joins On Maximizing the Efficiency of Processing Joins Conclusion 12

Introduction – Questions 1.How can we measure the efficiency of a moving window join evaluation strategy, since the traditional metric of execution time to completion does not apply? 2.Can an algorithm for a moving window join take advantage of asymmetries in the rates of the input streams? 3.How can we deal with cases in which an input stream is so fast that the system cannot keep up? 4.If memory is the bottleneck, how should we allocate memory between the two windows for the two inputs? 13

Introduction – The Three Scenarios One stream is much faster than the other. System resources are insufficient to keep up with the input streams. Memory is limited. 14

Abstract Background Introduction Related Work Estimating the Cost of Moving Window Joins On Maximizing the Efficiency of Processing Joins Conclusion 15

Related Work Predicate grouping and group optimization techniques Adaptive query processing and query scrambling Symmetric Hash Join and symmetric nested loops join Diag-Join for data warehouse environment Rate based streaming query optimization framework 16

Abstract Background Introduction Related Work Estimating the Cost of Moving Window Joins On Maximizing the Efficiency of Processing Joins Conclusion 17

Estimating the Cost of Moving Window Joins Cost model Cost of a single join operation 18

Cost of Nested Loop Join A to B 19 Number of tuples accessed in a time unit Cost of accessing a single tuple Number of tuples accessed to search for matched in window B Number of tuples insert and invalidation

Cost of Hash Join A to B 20 Cost of probe(b) and invalidate(b) is a function of the hash bucket size in window B Cost of accessing a single tuple in a specific hash table implementation

Cost of Full Join Symmetric Join HHJ, NNJ 21

Cost of Full Join Asymmetric Join HNJ 22

Cost Curves for Full Joins 23 σ a = 1/|A| = 1/Nkey(A) σ b = 1/|B| = 1/Nkey(B)

Observation from the Previous Graphs When input streams speed difference is minimal, HJ outperforms every other join combinations. As the speed gap increases, the cost of HJ increases considerably and exceeds that of HNJ at around 70 tuples/sec and 140 tuples/sec. Here we have a performance crossover point. 24

Estimating the Weight Factors The crossover points can be calculated by equating the two cost formulas For two given streams, we can determine when NLJ will outperform HJ, depending on the ratio of the arrival of the input streams. 25 …

Abstract Background Introduction Related Work Estimating the Cost of Moving Window Joins On Maximizing the Efficiency of Processing Joins Conclusion 26

Recall the three scenarios One stream is much faster than the other. System resources are insufficient to keep up with the input streams. Memory is limited. 27

Exploiting Asymmetry in Input Streams Speed Assumptions: The two time windows are fixed. The aggregate speed of two streams is less than the systems service rate μ (i.e., λ a + λ b < μ ). The following inequality determines the likely winner between NLJ and HJ: If inequality holds, NLJ will outperform HJ; otherwise, HJ outperforms NLJ. 28

Graphs to Prove the Previous Hypothesis 29

Observation from the Previous Graphs HHJ costs the least until the input rate reaches about 70 tuples/sec; then HNJ takes over. Hence, either HHJ or HNJ is the winner. Both hash join output rates decrease drastically after passing their thrashing point. 30

Maximizing the Number of Result Tuples with Limited Computing Resources This scenario arises under the following conditions: System evaluates very expensive predicates The input streams speed is faster than the join operators service rate, i.e., λ a + λ b > μ. Hence, not all answer tuples can be generated and input streams need to be regulated. But, what policy? 31

Performance Comparison between Policies 32 The winner is the equal distribution strategy! Regardless of time window sizes and window selectivity factors.

Maximizing the Number of Result Tuples with Limited Memory Assumption: The two time window sizes can be adjusted to fully utilize available memory. The two arrival rates are constant. Hence, memory allocation strategies are necessary. But, what policy? Will equal distribution win again? 33

Performance Comparison between Policies 34 The winner is the Max A strategy, which allocates all memory to the slower stream. Keep the slower stream in memory and let the faster one probe against it and pass by.

Maximizing the Number of Result Tuples with Limited Memory Another assumption: Variable time windows Variable arrival rates 35

Performance Comparison between Policies 36 The best policy is either maximizing stream As time window in conjunction with maximizing Bs arrival rate, or we can maximize Bs time window and As arrival rate alternatively.

Abstract Background Introduction Related Work Estimating the Cost of Moving Window Joins On Maximizing the Efficiency of Processing Joins Conclusion 37

Conclusion A unit-time basis model to analyze expected performance of moving window joins is introduced. The proposed cost-model divides the join cost into two independent terms, each corresponding to one of the two join directions. This work can be extended to have a cost model beyond single joins and for full query plans. Other algorithms apart from NLJ and HJ can be modeled and evaluated. 38

The End Thanks for your attention