SURAJIT CHAUDHURI RAJEEV MOTWANI VIVEK NARASAYYA On random sampling over Joins Presented by : Srikantha Nema
Outline Semantics of Sample Difficulty of join Sampling Algorithms for Sampling Sampling strategies New strategies for join Sampling Experimental evaluation Conclusions
Terminologies SAMPLE(R, f) is an SQL operation When a query Q is evaluated, we obtain relation R f is a fraction of a relation R
Semantics of Sample Sampling with Replacement (WR) Sampling without Replacement (WoR) Independent Coin Flips (CF)
Difficulty of Join Sampling
Classification of Join Sampling problem Case A No information is available for either or Case B No information is available for but indexes and /or statistics are available for Case C Indexes/statistics are available for and
Algorithms for Sampling Unweighted Sequential WR Sampling Black-Box U1 Black-Box U2 Weighted Sequential WR Sampling Black-Box WR1 Black-Box WR2
Unweighted Sequential WR Sampling Black-Box U2 Black-Box U1
Weighted Sequential Sampling Black-Box WR1 Black-Box WR2
Sampling Strategies (old) Strategy Naïve-Sample Strategy Olken-Sample
New strategies for join Sampling Strategy Stream-Sample Strategy Group-Sample Strategy Frequency-Partition-Sample
Experimental Evaluation 1
Experimental Evaluation 2
Experimental Evaluation 3
Conclusions Difficulty of join sampling Classification of the problem into 3 cases Strategies for join sampling New schemes for sequential random sampling for uniform and weighted sampling More efficient strategies can be developed for the case of single join More work needed to understand the problem of sampling the result of join trees
Thank You