6.830 Lecture 11 Query Optimization & Automatic Database Design 10/8/2014.

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Salvatore Ruggieri SIGKDD2010 Frequent Regular Itemset Mining 2010/9/2 1.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Data Mining Techniques Association Rule
Selinger Optimizer Lecture 10 October 15, 2009 Sam Madden.
Review for Final Exam Lecture Week 14. Problems on Functional Dependencies and Normal Forms.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
6.830 Lecture 10 Query Optimization 10/6/2014. Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2,...
Lecture 10 Query Optimization II Automatic Database Design.
Properties of Armstrong’s Axioms Soundness All dependencies generated by the Axioms are correct Completeness Repeatedly applying these rules can generate.
FPtree/FPGrowth (Complete Example). First scan – determine frequent 1- itemsets, then build header B8 A7 C7 D5 E3.
Quit Permutations Combinations Pascal’s triangle Binomial Theorem.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Classroom Exercise: Normalization
Mining Time-Series Databases Mohamed G. Elfeky. Introduction A Time-Series Database is a database that contains data for each point in time. Examples:
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
CONGRUENT AND SIMILAR FIGURES
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
Lecture 07 Prof. Dr. M. Junaid Mughal
Lecture 11 Main Memory Databases Midterm Review. Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed.
P ERMUTATIONS AND C OMBINATIONS Homework: Permutation and Combinations WS.
Jessie Zhao Course page: 1.
Lesson 1.9 Probability Objective: Solve probability problems.
Introduction to Geometry
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Aim: Properties of Square & Rhombus Course: Applied Geo. Do Now: Aim: What are the properties of a rhombus and a square? Find the length of AD in rectangle.
A BC D Let ABCD be a quadrilateral. Join AC. Clearly, ∠ 1 + ∠ 2 = ∠ A (i) And, ∠ 3 + ∠ 4 = ∠ C (ii) We know that the sum of the angles.
Lecture 9 Query Optimization.
0 0 Correction: Problem 18 Alice and Bob have 2n+1 FAIR coins, each with probability of a head equal to 1/2.
The Binomial Theorem Lecture 29 Section 6.7 Mon, Apr 3, 2006.
Membership problem CYK Algorithm Project presentation CS 5800 Spring 2013 Professor : Dr. Elise de Doncker Presented by : Savitha parur venkitachalam.
Examples. Examples (1/11)  Example #1: f(A,B,C,D) =  m(2,3,4,5,7,8,10,13,15) Fill in the 1’s. 1 1 C A B CD AB D 1 1.
Counting CSC-2259 Discrete Structures Konstantin Busch - LSU1.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Multiplying Binomials
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
(CSC 102) Lecture 13 Discrete Structures. Previous Lectures Summary  Direct Proof  Indirect Proof  Proof by Contradiction  Proof by Contra positive.
Counting Techniques Tree Diagram Multiplication Rule Permutations Combinations.
March 11, 2005 Recursion (Implementation) Satish Dethe
Data Mining  Association Rule  Classification  Clustering.
8.1 Ratio and Proportion Geometry--Honors Mrs. Blanco.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
Lecture 34 Section 6.7 Wed, Mar 28, 2007
Permutations and Combinations
Reducing Number of Candidates
Additional Example 3 Additional Example 4
Data Mining and Its Applications to Image Processing
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
تصنيف التفاعلات الكيميائية
Data Mining Association Analysis: Basic Concepts and Algorithms
Pascal's Triangle This is the start of Pascal's triangle. It follows a pattern, do you know.
External Joins Query Optimization 10/4/2017
Binomial Expansion 2 L.O. All pupils can follow where the binomial expansion comes from All pupils understand binomial notation All pupils can use the.
Design and Analysis of Multi-Factored Experiments
Permutations and Combinations
Fractional Factorial Design
9.2 Proving Quadrilaterals are Parallelograms
Lecture 29 Section 6.7 Thu, Mar 3, 2005
Design matrix Run A B C D E
Association Analysis: Basic Concepts
Presentation transcript:

6.830 Lecture 11 Query Optimization & Automatic Database Design 10/8/2014

Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2,... n (in that order) e.g. {A}, {B}, {C}, {AB}, {AC}, {BC}, {ABC} R  set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join optjoin(S-a) to a + min. access cost for a Precomputed in previous iteration!

Selinger, as code R  set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optcost s = ∞ optjoin S = ø for a in S: //a is a relation c = optcost s-a + min. cost to join optjoin s-a to a + min. access cost for a if c < optcost s optcost s = c optjoin s = optjoin s-a joined optimally w/ a This is the same algorithm as on the previous slide, written differently Pre-computed in previous iteration!

Example 4 Relations: ABCD (only consider NL join) Optjoin: A = best way to access A (e.g., sequential scan, or predicate pushdown into index...) B = " " " " B C = " " " " C D = " " " " D {A,B} = AB or BA {A,C} = AC or CA {B,C} = BC or CB {A,D} {B,D} {C,D} R  set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a Optjoin

Example (con’t) Optjoin {A,B,C} = remove A: compare A({B,C}) to ({B,C})A remove B: compare ({A,C})B to B({A,C}) remove C: compare C({A,B}) to ({A,B})C {A,C,D} = … {A,B,D} = … {B,C,D} = … … {A,B,C,D} = remove A: compare A({B,C,D}) to ({B,C,D})A remove B: compare B({A,C,D}) to ({A,C,D})B remove C: compare C({A,B,D}) to ({A,B,D})C remove D: compare D({A,C,C}) to ({A,B,C})D R  set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a Optjoin

Complexity Number of subsets of set of size n = |power set of n| = 2 n (here, n is number of relations) How much work per subset? Have to iterate through each element of each subset, so this at most n n2 n complexity (vs n!) n=12  49K vs 479M R  set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a Optjoin

Materialized Views sales : (saleid, date, time, register, product, price,...) CREATE MATERIALIZED VIEW sales_by_date AS SELECT date, product, sum(price), count(*) AS quantity FROM sales GROUP BY date, product Key properties: Kept up to date as data is added Selected for use automatically by optimizer when appropriate