Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.

Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu Wei Some slides are adapted from the presentation slides in sigmod2004

Presentation Overview Motivation A-GREEDY Algorithm INDEPENDENT Algorithm SWEEP Algorithm LOCALSWAPS Algorithm Adaptive Ordering of Multiway Joins Over Streams Conclusion

Motivation Many queries on stream data involve processing commutative sets of filters Each filter is a stateless predicate that indicates whether a tuple should be further processed A tuple is output only if it satisfies all filters Applying filters in different orders can affect the processing time of the query The changing nature of stream data requires adaptive algorithms for filter ordering

Motivation: Example Consider the following example usage of a stream processing system: –A network traffic monitoring application built on a stream engine is used to monitor the amount of common traffic flowing through four routers, R 1, R 2, R 3, and R 4 over the last ten minutes.

Motivation: Filter Ordering If the filters are dependent, then the order of probing affects processing time For instance, it could be known that most traffic through R 1 comes from R 3 or R 4, but rarely from R 2 For incoming data on R 1, probing W 2 first would drop a tuple quickest, thus saving the time of probing W 3 and W 4

Motivation: Filter Ordering Challenges Filter selectivities may be correlated The candidate filter orderings to consider for N filters is N!, which increases quickly Attributes of stream data and arrival characteristics can change over time

Motivation: Adaptive Filter Ordering Challenges Run-time overhead incurred for continuous monitoring of statistics Convergence properties required to prove an adaptive algorithm generates an approximation to correct answer Speed of adaptivity in the face of changing data characteristics There is a three-way tradeoff: Algorithms that adapt quickly and have good convergence properties incur a lot of run-time overhead

A-GREEDY Algorithm The cost of a query plan ordering is t i is the cost of processing one tuple in filter i d(j|j-1) is the conditional probability that filter j will drop a tuple, given that no filter from 1 to j-1 dropped it

A-GREEDY Algorithm: Static Version and Greedy Invariant Static greedy algorithm 1.Choose the filter with highest unconditional drop probability d(i|0) as the first filter 2.Choose the filter with the highest conditional drop probability d(j|1) as the next filter 3.Choose the filter with highest conditional drop probability d(k|2) as the next filter 4.And so on Greedy Invariant occurs when a filter ordering satisfies the following

A-GREEDY Algorithm: Profiler Two logical components of A-Greedy: –profiler: continuously collects and maintains statistics about filter selectivities and processing costs –Reoptimizer: detects and corrects violations of the GI in the current filter ordering Challenge faced by the profiler: n2 n-1 conditional selectivities for n filters Solution: profile of recently dropped tuples

A-GREEDY Algorithm: Profiler Profile: a sliding window of profile tuples A profile tuple contains n boolean attributes b 1,…,bn corresponding to n filters Dropped tuples are sampled with some probability p If a tuple e is chosen for profiling, it will be tested by all remaining filters A new profile tuple e: bi = 1 if Fi drop e, otherwise bi=0

A-GREEDY Algorithm: Reoptimizer The reoptimizer maintains an ordering O such that O satisfies the GI How does the reoptimizer use profile to derive estimates of conditional selectivities? –It incrementally maintains a view over the profile window

A-GREEDY Algorithm: Reoptimizer Matrix: View over the profile window: n×n V[i, j]: number of tuples in the profile window that were dropped by F f(j) but not dropped by F f (1) F f (2),…, F f (i-1) V[i, j] is proportional to d (j|i-1)

A-GREEDY Algorithm: Run-time Overhead and Adaptivity Profile-tuple creation: needs additional n-I evaluations for a tuple dropped by F f(i). Profile-window maintenance: insertion and deletion of profile tuples, averages of filter processing times Matrix-view update: every update would cause access to up to n 2 /4 entries Detection and correction of GI violations Good convergence properties Fast adaptivity

Can we sacrifice some of A-Greedy’s convergence properties or adaptivity speed to reduce its run-time overhead? Three variants of A greedy are proposed Variants of A Greedy

Proceeds in stages During one stage, only checks for GI violations involving the filter at one specific position j Do not need to maintain entire matrix view: b f (1),…,b f (j) For each profiled tuple, needs to additionally evaluate F f (j) only By rotating j over 2,…,n, eventually detects and corrects all GI violations Same convergence properties as A-Greedy Reduced view and need for additional evaluations less overhead slower adaptivity: only 1 filter is profiled in each stage SWEEP Algorithm

INDEPENDENT Algorithm Assumes filters are independent we only need to maintain estimates of unconditional selectivities Lower view maintenance overhead Fast adaptivity Convergence: if assumption holds, optimal. Otherwise, can be O(n) times worse than GI orderings

LOCALSWAP Algorithm Monitors “local” violations only, i.e., violations involving adjacent filters LocalSwaps detects situations where a swap between adjacent filters Only need to maintain two diagonals of the view For each profiled tuple dropped by F f(i), only needs to additionally evaluate F f(i+1) xx xx xx x

Adaptive Ordering of Multiway Joins Over Streams MJoins maintain an ordering of {S0, S1,…, Sn-1}-{Si} for each stream Si New tuples arriving from Si is joined with other stream windows in that order Two-phase join algorithm –Drop-probing phase: the new tuple is used to probe all other windows in the specified order. If any window drops it, no further processing will be needed for it –output-generation phase: if no window drops the tuple, proceeds as conventional MJoins Drop probing resembles pipelined filters A-Greedy and its variants can be used to determine the orderings

Conclusion A-Greedy has good convergence properties and fast adaptivity, but incurs significant run-time overhead Three variants of A-Greedy are proposed, each lying at a different points along the tradeoff spectrum among convergence, runtime overhead, and speed of adaptivity

Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.

Similar presentations

Presentation on theme: "Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.

Similar presentations

Presentation on theme: "Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu."— Presentation transcript:

Similar presentations

About project

Feedback