Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University.

Slides:



Advertisements
Similar presentations
Turing Machines January 2003 Part 2:. 2 TM Recap We have seen how an abstract TM can be built to implement any computable algorithm TM has components:
Advertisements

ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv.
A General Algorithm for Subtree Similarity-Search The Hebrew University of Jerusalem ICDE 2014, Chicago, USA Sara Cohen, Nerya Or 1.
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
A Paper on RANDOM SAMPLING OVER JOINS by SURAJIT CHAUDHARI RAJEEV MOTWANI VIVEK NARASAYYA PRESENTED BY, JEEVAN KUMAR GOGINENI SARANYA GOTTIPATI.
The Volcano/Cascades Query Optimization Framework
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Lecture 12: Revision Lecture Dr John Levine Algorithms and Complexity March 27th 2006.
Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem.
A polylogarithmic approximation of the minimum bisection Robert Krauthgamer The Hebrew University Joint work with Uri Feige.
Efficient Query Evaluation on Probabilistic Databases
Enumerating Large Query Results Benny Kimelfeld IBM Almaden Research Center Sara Cohen The Hebrew University of Jerusalem Yehoshua Sagiv The Hebrew University.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
Analysis of Algorithms CS 477/677
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Chapter 11: Limitations of Algorithmic Power
Backtracking Reading Material: Chapter 13, Sections 1, 2, 4, and 5.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Query Processing & Optimization
Scheduling Parallel Task
Induction and recursion
Query Processing Presented by Aung S. Win.
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
Database Management 9. course. Execution of queries.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
Chapter 12 Recursion, Complexity, and Searching and Sorting
Analysis of Algorithms
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Analysis of Algorithms These slides are a modified version of the slides used by Prof. Eltabakh in his offering of CS2223 in D term 2013.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Télécom 2A – Algo Complexity (1) Time Complexity and the divide and conquer strategy Or : how to measure algorithm run-time And : design efficient algorithms.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
CSC 413/513: Intro to Algorithms NP Completeness.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
CIKM Finding and Approximating Top-k Answers in Keyword Proximity Search Benny Kimelfeld Yehoshua Sagiv Benny Kimelfeld and Yehoshua Sagiv The Selim.
איך עונים על שאילתה, כשהתוצאה גדולה מאד? שרה כהן בית הספר להנדסה ולמדעי המחשב ע"ש רחל וסלים בנין ע"ש רחל וסלים בנין.
1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Finding a Minimal Tree Pattern Under Neighborhood Constraints Benny Kimelfeld Yehoshua Sagiv IBM Research – AlmadenThe Hebrew University of Jerusalem 2011.
SNU OOPSLA Lab. 1 Great Ideas of CS with Java Part 1 WWW & Computer programming in the language Java Ch 1: The World Wide Web Ch 2: Watch out: Here comes.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
Lecture 3: Uninformed Search
Unit 1. Sorting and Divide and Conquer
A paper on Join Synopses for Approximate Query Answering
Computing Full Disjunctions
Data Integration with Dependent Sources
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Objective of This Course
Lecture 2- Query Processing (continued)
Chapter 11 Limitations of Algorithm Power
Presentation transcript:

Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University Israel Yehoshua Sagiv Hebrew University Israel Itzhak Fadida Technion Israel VLDB 2006 Seoul, Korea

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Computing Full Disjunctions full disjunction The full disjunction is a relational operator that maximally combines data from several relations –It extends the natural join by allowing incompleteness –It extends the binary outerjoin to many relations This paper presents algorithms and optimizations for computing full disjunctions –Theoretically, full disjunctions are more tractable than previously known –Practically, a significant improvement over the state-of- art, an iterator-like evaluation

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06Contents Full Disjunctions − Complexity Contributions Algorithms − Algorithm NLOJ for Tree-Structured Schemes − Algorithm PDelayFD for General Schemes − Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06Contents Full Disjunctions − Complexity Contributions Algorithms − Algorithm NLOJ for Tree-Structured Schemes − Algorithm PDelayFD for General Schemes − Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 The Natural Join Operator CountryClimateCityHotelStarsSite Climates Accommodations Sites CountryClimate Canadadiverse Bahamastropical UKtemperate Climates CountryCityHotelStars CanadaTorontoPlaza4 CanadaLondonRamada3 BahamasNassauHilton Accommodations CountryCitySite CanadaLondonAir Show CanadaMouth Logan UKLondonBuckingham UKLondonHyde Park Sites CanadadiverseLondonRamada3Air Show

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 The Natural Join Misses Information CountryClimate Canadadiverse Bahamastropical UKtemperate CountryCityHotelStars CanadaTorontoPlaza4 CanadaLondonRamada3 BahamasNassauHilton CountryCitySite CanadaLondonAir Show CanadaMouth Logan UKLondonBuckingham UKLondonHyde Park Climates Accommodations Sites CanadadiverseLondonRamada3Air Show Climates Accommodations Sites CountryClimateCityHotelStarsSite Sites Bahamas is not in Sites, so the natural join misses it

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 The Natural Join Misses Information CountryClimate Canadadiverse Bahamastropical UKtemperate Climates Accommodations CountryCityHotelStars CanadaTorontoPlaza4 CanadaLondonRamada3 BahamasNassauHilton CountryCitySite CanadaLondonAir Show CanadaMouth Logan UKLondonBuckingham UKLondonHyde Park CountryClimateCityHotelStarsSite Climates Accommodations Sites CanadadiverseLondonRamada3Air Show Sites Bahamas is not in Sites, so the natural join misses it Mouth Logan is not in a city, hence missed Empty space means null value

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 The Natural Join Misses Information CountryClimate Canadadiverse Bahamastropical UKtemperate Climates Accommodations CountryCityHotelStars CanadaTorontoPlaza4 CanadaLondonRamada3 BahamasNassauHilton A looser notion of join is needed—one that enables joining tuples from some of the tables CountryCitySite CanadaLondonAir Show CanadaMouth Logan UKLondonBuckingham UKLondonHyde Park CountryClimateCityHotelStarsSite Climates Accommodations Sites CanadadiverseLondonRamada3Air Show Sites Bahamas is not in Sites, so the natural join misses it Mouth Logan is not in a city, hence missed

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 The Natural Join Operator CountryClimateCityHotelStarsSite Climates Accommodations Sites CountryClimate Canadadiverse Bahamastropical UKtemperate Climates CountryCityHotelStars CanadaTorontoPlaza4 CanadaLondonRamada3 BahamasNassauHilton Accommodations CountryCitySite CanadaLondonAir Show CanadaMouth Logan UKLondonBuckingham UKLondonHyde Park Sites CanadadiverseLondonRamada3Air Show A tuple of the join corresponds to a set of tuples from the source relations Join consistent Connected No Cartesian productComplete One tuple from each relation Join consistent Connected No Cartesian productComplete One tuple from each relation

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Join-Consistent Sets of Tuples A set T of tuples is join-consistent if every two tuples of T are join-consistent Two tuples t 1 and t 2 are join-consistent if for every common attribute A: t 1 [A] and t 2 [A] are non-null t 1 [A] = t 2 [A] CountryCityHotelStars CanadaLondonRamada CountryCitySite CanadaLondonAir Show

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Connected Sets of Tuples CountryClimate Canadadiverse CountryCitySite UKLondonBuckingham  The nodes are the tuples of T  An edge between every two tuples with a common attribute The join graph of a set T of tuples: A set of tuples is connected if its join graph is connected CityHotelStars TorontoPlaza4

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Natural Join (w/o Cartesian Product) T is join consistent 1. Each tuple of the result corresponds to a set T of tuples from the source relations T is connected No Cartesian product T is connected No Cartesian product2. T is complete One tuple from each relation T is complete One tuple from each relation3. JCC

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Full Disjunction (Galindo-Legaria 1994) T is join consistent 1. T is connected No Cartesian product T is connected No Cartesian product2. T is complete One tuple from each relation T is complete One tuple from each relation3. Each tuple of the result corresponds to a set T of tuples from the source relations T is maximal Not properly contained in any JCC set T is maximal Not properly contained in any JCC set3. JCC

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 An Example of a Full Disjunction CountryClimate Canadadiverse UKtemperate Climates CountryCityHotelStars CanadaTorontoPlaza4 CanadaLondonRamada3 Accommodations CountryCitySite CanadaLondonAir Show CanadaMouth Logan UKLondonBuckingham Sites CountryClimateCityHotelStarsSite FD ( R ) R

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 An Example of a Full Disjunction CountryClimate Canadadiverse UKtemperate Climates CountryCityHotelStars CanadaTorontoPlaza4 CanadaLondonRamada3 Accommodations CountryCitySite CanadaLondonAir Show CanadaMouth Logan UKLondonBuckingham Sites CountryClimateCityHotelStarsSite CanadadiverseTorontoPlaza4 FD ( R ) R

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 An Example of a Full Disjunction CountryClimate Canadadiverse UKtemperate Climates CountryCityHotelStars CanadaTorontoPlaza4 CanadaLondonRamada3 Accommodations CountryCitySite CanadaLondonAir Show CanadaMouth Logan UKLondonBuckingham Sites CountryClimateCityHotelStarsSite CanadadiverseTorontoPlaza4 CanadadiverseLondonRamada3Air Show FD ( R ) R

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 An Example of a Full Disjunction CountryClimate Canadadiverse UKtemperate Climates CountryCityHotelStars CanadaTorontoPlaza4 CanadaLondonRamada3 Accommodations CountryCitySite CanadaLondonAir Show CanadaMouth Logan UKLondonBuckingham Sites CountryClimateCityHotelStarsSite CanadadiverseTorontoPlaza4 CanadadiverseLondonRamada3Air Show Canadadiverse Mouth Logan FD ( R ) R

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 An Example of a Full Disjunction CountryClimate Canadadiverse UKtemperate Climates CountryCityHotelStars CanadaTorontoPlaza4 CanadaLondonRamada3 Accommodations CountryCitySite CanadaLondonAir Show CanadaMouth Logan UKLondonBuckingham Sites CountryClimateCityHotelStarsSite CanadadiverseTorontoPlaza4 CanadadiverseLondonRamada3Air Show Canadadiverse Mouth Logan UKtemperate London Buckingham FD ( R ) R

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 An Example of a Full Disjunction CountryClimate Canadadiverse UKtemperate Climates CountryCityHotelStars CanadaTorontoPlaza4 CanadaLondonRamada3 Accommodations CountryCitySite CanadaLondonAir Show CanadaMouth Logan UKLondonBuckingham Sites CountryClimateCityHotelStarsSite CanadadiverseTorontoPlaza4 CanadadiverseLondonRamada3Air Show Canadadiverse Mouth Logan UKtemperate London Buckingham FD ( R ) R

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Padding Joined Tuple Sets with Nulls CountryCitySite CanadaMouth Logan CountryClimate Canadadiverse Canadadiverse Mouth Logan CountryClimateCityHotelStarsSite

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 The Outerjoin Operator The outerjoin of two relations R 1 and R 2 R 1 R 2 The natural join R 1 R 2 and, in addition, all dangling tuples padded with nulls

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Example of an Outerjoin CountryClimate Canadadiverse Bahamastropical UKtemperate Climates CountryCityHotelStars CanadaTorontoPlaza4 FranceParisAtala4 BahamasNassauHilton Accommodations CountryClimateCityHotelStars CanadadiverseTorontoPlaza4 BahamastropicalNassauHilton UKtemperate FranceParisAtala4 Climates Accommodations

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Combining Relations using Outerjoins The outerjoin operator is not associative For more than two relations, the result depends on the order in which the outerjoin is applied In general, outerjoins cannot maximally combine relations (no matter what order is used) Outerjoin is not suitable for combining more than two relations!

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06Contents Full Disjunctions − Complexity Contributions Algorithms − Algorithm NLOJ for Tree-Structured Schemes − Algorithm PDelayFD for General Schemes − Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Efficiency of Evaluation The full-disjunction operator (as well as other operators like the Cartesian product or the natural join) can generate an exponential (in the input size) number of tuples Polynomial running time is not a suitable yardstick The usual notion: Polynomial time in the combined size of the input and the output

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 History of Algorithms for Full Disjunctions SourceTimeDatabases RU96 O(n+F2)O(n+F2)  -acyclic KS03 O(n5N2F2)O(n5N2F2) general CS05 O (n 3  N  F 2 ) “ incremental polynomial” general n:N:F:n:N:F: number of relations number of tuples in the DB number of tuples in the FD This paper: linear dependence on F F is typically very large Can be exponential in the size of the database

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Polynomial Delay One way to obtain an evaluation with a running time linear in the output is to devise an algorithm that acts as an iterator with an efficient next() operator, that is, An enumeration algorithm that runs with polynomial delay An enumeration algorithm runs with polynomial delay if the time between every two successive answers is polynomial in the size of the input time

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Other Benefits of Polynomial Delay Incremental evaluation  First tuples are generated quickly Full disjunctions are large, yet the user need not wait for the whole result to be generated  Suitable for Web applications, where users expect to get the first few pages quickly In addition, the user can decide anytime that enough information has been shown Enable parallel query processing  While one processor generates the FD tuples, other processors apply further processing

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06Contents Full Disjunctions − ComplexityContributions Algorithms − Algorithm NLOJ for Tree-Structured Schemes − Algorithm PDelayFD for General Schemes − Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Main Contributions 1. polynomial delay 1. First algorithm for computing full disjunctions with polynomial delay 2. linear 2. First algorithm for computing full disjunctions in time linear in the output 3. optimization 3. A general optimization technique for computing full disjunctions Division into biconnected components Substantial improvement over the state-of-art is proved theoretically and experimentally

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06Contents Full Disjunctions − Complexity ContributionsAlgorithms − Algorithm NLOJ for Tree-Structured Schemes − Algorithm PDelayFD for General Schemes − Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Our Algorithms NLOJ Algorithm NLOJ Tree Schemes NLOJ Algorithm NLOJ Tree Schemes PDelayFD Algorithm PDelayFD General Schemes PDelayFD Algorithm PDelayFD General Schemes Biconnected Components Division into Biconnected Components Optimization Biconnected Components Division into Biconnected Components OptimizationCombine BiComNLOJ Algorithm BiComNLOJ Main Algorithm − General Schemes BiComNLOJ Algorithm BiComNLOJ Main Algorithm − General Schemes

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06Contents Full Disjunctions − Complexity Contributions Algorithms − Algorithm NLOJ for Tree-Structured Schemes − Algorithm PDelayFD for General Schemes − Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Tree Schemes R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 Scheme graphs w/o cycles In the scheme graph, the relation schemes are the nodes and there is an edge between every two schemes with one or more common attributes

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Left-Deep Sequence of Outerjoins R R : a set of relations with a tree scheme R 1,…, R n : R 1,…, R n : a connected-prefix order of R Algorithm NLOJ (Nested Loop OuterJoin) Compute a connected-prefix order of R Apply outerjoins in a left-deep order FD ( R ) = ( … ((R 1 R 2 ) R 3 ) … ) R n Proposition:

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Connected-Prefix Order of Relations A connected-prefix order of relations: Each prefix forms a (connected) subtree R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R1R1 R3R3 R2R2 R7R7 R4R4 R5R5 R6R6

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Achieving Polynomial Delay Algorithm NLOJ (Nested Loop OuterJoin) Compute a connected-prefix order of R Apply outerjoins in a left-deep order R1R1 R2R2 R3R3 R n-1 RnRn … Already exponential size! Problem: Problem: exp. delay Solution: Solution: use iterators

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06IteratorsAlgorithm Operate on top of an enumeration algorithm Implement next() by controlling the execution To obtain polynomial delay, we use iterators Iterator next()

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Using Iterators for Outerjoins Iterator 1 Iterator n Iterator 2 Iterator n-1 R1R1 R2R2 R3R3 R n-1 RnRn …

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Outerjoins are not Always Applicable It is not always possible to formulate a full disjunction as a left-deep sequence of outerjoins Rajaraman and Ullman: Rajaraman and Ullman [PODS 96]: Some full disjunctions cannot be formulated as expressions of outerjoins (i.e., with arbitrary placement of parentheses)

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06Contents Full Disjunctions − Complexity Contributions Algorithms − Algorithm NLOJ for Tree-Structured Schemes − Algorithm PDelayFD forGeneralSchemes − Algorithm PDelayFD for General Schemes − Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 About the Algorithm Unlike NLOJ, the next algorithm, PDelayFD, is applicable to all schemes (and not just trees) Algorithm PDelayFD has a polynomial delay, but the delay is larger than that of NLOJ Nevertheless, PDelayFD by itself is a significant improvement over the state-of-art

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Shifting a Maximal JCC Tuple Set T t-shifting T: t t t t-shift of T Add t to T Extract max. JCC subset containing t Extend to a maximal JCC set T

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Validate that the t-shift is not already in Q or C Algorithm PDelayFD Generate a max. JCC set T Insert T 0 into Q Repeat until Q is empty: Move some T from Q to C Print the join of T, padded with nulls Insert into Q a t-shift of T for all tuples t in the database Output: … PDelayFD ( R ) computes FD ( R ) with polynomial delayTheorem: C Q

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06Contents Full Disjunctions − Complexity Contributions Algorithms − Algorithm NLOJ for Tree-Structured Schemes − Algorithm PDelayFD for General Schemes − Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 NLOJ vs. PDelayFD R3R3 R5R5 R2R2 R9R9 R8R8 R7R7 R 10 R4R4 R6R6 R1R1 NLOJNLOJPDelayFDPDelayFD R3R3 R5R5 R2R2 R9R9 R8R8 R7R7 R4R4 R6R6 R1R1 R3R3 R5R5 R2R2 R9R9 R8R8 R7R7 R4R4 R6R6 R1R1 ? divide and conquer Our approach: divide and conquer  Shorter delays  Less space  Simpler to impl.

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Biconnected Components R1R1 R2R2 R3R3 R4R4 R7R7 R1R1 R2R2 R4R4 R7R7 R8R8 R9R9 R5R5 R6R6 R3R3 R5R5 R6R6 R8R8 Biconnected component: A maximal subset B of relations, s.t. the scheme graph has two (or more) disjoint paths between every two relations of B

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Left-Deep Sequence of Outerjoins R R : a set of relations Theorem: Optimized Algorithm: Compute the biconnected components of R Compute the full disjunction of each component Apply outerjoins in a suitable order There exists an (efficiently computable) order B 1,…, B k of the biconnected components of R, s.t. FD ( R ) = ( … (( FD ( B 1 ) FD ( B 2 )) … ) FD ( B k ) There exists an (efficiently computable) order B 1,…, B k of the biconnected components of R, s.t. FD ( R ) = ( … (( FD ( B 1 ) FD ( B 2 )) … ) FD ( B k )

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 BiComNLOJ : a Naïve Attempt Divide R into biconnected components → B 1,… B k in a suitable order Divide R into biconnected components → B 1,… B k in a suitable order Compute FD ( B 1 ),…, FD ( B k ) PDelayFD − using PDelayFD Compute FD ( B 1 ),…, FD ( B k ) PDelayFD − using PDelayFD 3. NLOJ 3. U sing NLOJ, compute ( … (( FD ( B 1 ) FD ( B 2 )) … ) FD ( B k ) 3. NLOJ 3. U sing NLOJ, compute ( … (( FD ( B 1 ) FD ( B 2 )) … ) FD ( B k ) Each FD ( B i ) can be exponential in the input Non-polynomial delay! IteratorIterator Iterator Solution:

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 After generating a tuple t of FD ( B 1 ), we need to generate all tuples of FD ( B 2 ) that can join t Non-polynomial delay if all of FD ( B 2 ) is computed for finding these tuples! Solution:Solution: PDelayFD can be modified so that it generates only those tuples of FD ( B 2 ) that can join t Retaining Polynomial Delay: 1 st Problem For simplification, assume only two components R2R2 R3R3 R1R1 R4R4 R6R6 R7R7 R5R5 R8R8 B1B1 B2B2 Details in the proceedings…

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 The last step is to generate all tuples of FD ( B 2 ) that cannot be joined with tuples of FD ( B 1 ) However, this task is by itself NP-hard! Solution: When generating all tuples of FD ( B 2 ) that can be joined with some tuple of FD ( B 1 ), we collect enough information for generating the remaining tuples of FD ( B 2 ) Retaining Polynomial Delay: 2 nd Problem For simplification, assume only two components Details in the proceedings… R2R2 R3R3 R1R1 R4R4 R6R6 R7R7 R5R5 R8R8 B1B1 B2B2

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06Contents Full Disjunctions − Complexity Contributions Algorithms − Algorithm NLOJ for Tree-Structured Schemes − Algorithm PDelayFD for General Schemes − Algorithm BiComNLOJ − Main Algorithm Experimental Results Conclusion

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Experimental Setting Algorithms: PDelayFD, BiComNLOJ (main) IncrementalFD (CS05, state-of-art) PosgreSQL PosgreSQL (open source) HW: HW: Pentium4, 1.6GHZ, 512MB RAM Implementation R3R3 R1R1 R5R5 R2R2 R4R4 R6R6 R9R9 R8R8 R7R7 R 10 Scheme S 1 R3R3 R1R1 R7R7 R5R5 R8R8 R2R2 R4R4 R6R6 R 10 R9R9 Scheme S 2 R2R2 R5R5 R1R1 R4R4 R9R9 R 10 R8R8 R7R7 R6R6 R3R3 Scheme S 3 Synthetic data (randomly generated) Fixed schemes

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Number of Tuples in each Relation Average Delay (msec) State-of-Art vs. Main Algorithm IncrementalFD (state of art, CS05)BiComNJOJ our main algorithm BiComNLOJ BiComNLOJ is a substantial improvement over the state-of-art 1 Scheme 1 2 Scheme 2 3 Scheme 3

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Number of Tuples in each Relation Average Delay (msec) Division into Biconnected Components Division reduces delays (amount depends on the scheme) Division reduces delays (amount depends on the scheme) PDelayFD (no division to b.c.c.)BiComNJOJ our main algorithm 1 Scheme 1 2 Scheme 2 3 Scheme 3

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Behavior of Delay IncrementalFD (state of art, CS05)BiComNJOJ our main algorithm Tuple Number Delay (msec) Measure the delay before each generated tuple While IncrementalFD has a slowdown, the delay of BiComNLOJ remains almost constant

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06Contents Full Disjunctions − Complexity Contributions Algorithms − Algorithm NLOJ for Tree-Structured Schemes − Algorithm PDelayFD for General Schemes − Algorithm BiComNLOJ − Main Algorithm Experimental ResultsConclusion

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06Summary Full Disjunction: An associative extension of the outerjoin operator to an arbitrary number of relations 3 Algorithms for computing FD: NLOJ Nested-Loop Outerjoin Tree-Structured SchemesNLOJ Nested-Loop Outerjoin Tree-Structured Schemes PDelayFD Polynomial-Delay Full Disjunction General SchemesPDelayFD Polynomial-Delay Full Disjunction General Schemes BiComNLOJ Combine first 2, deploy div. into biconnected components General SchemesBiComNLOJ Combine first 2, deploy div. into biconnected components General Schemes

Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06Contributions improvement of evaluation time Substantial improvement of evaluation time over the state-of-art  Proved theoretically and experimentally polynomial delaylinear Full disjunctions can be computed with polynomial delay and in time linear in the output size Optimization Optimization techniques for computing FDs PostgreSQL Implementation within PostgreSQL (ongoing…) SQL optimizer Incorporating our algorithms into an SQL optimizer  E.g., some operators can be pushed through the FD  Not discussed here, appears in the proceedings…

VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in Action VLDB 06 Thank you. Questions?