Enumerating Large Query Results Benny Kimelfeld IBM Almaden Research Center Sara Cohen The Hebrew University of Jerusalem Yehoshua Sagiv The Hebrew University.

Slides:



Advertisements
Similar presentations
ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv.
Advertisements

Review: Search problem formulation
Heuristic Search techniques
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Problems and Their Classes
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Introduction to Kernel Lower Bounds Daniel Lokshtanov.
CSC 421: Algorithm Design & Analysis
Greedy Algorithms Greed is good. (Some of the time)
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Optimizing and Parallelizing Ranked Enumeration Konstantin Golenberg Benny Kimelfeld Benny Kimelfeld Yehoshua Sagiv The Hebrew University of Jerusalem.
A Fairy Tale of Greedy Algorithms Yuli Ye Joint work with Allan Borodin, University of Toronto.
Algorithm Strategies Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Computational problems, algorithms, runtime, hardness
Keyword Proximity Search on Graphs M.Sc. Systems Course The Hebrew University of Jerusalem, Winter 2006.
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Chapter 11: Limitations of Algorithmic Power
Backtracking Reading Material: Chapter 13, Sections 1, 2, 4, and 5.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Computational aspects of stability in weighted voting games Edith Elkind (NTU, Singapore) Based on joint work with Leslie Ann Goldberg, Paul W. Goldberg,
Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University.
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
Decision Procedures An Algorithmic Point of View
Analysis of Algorithms
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Fixed Parameter Complexity Algorithms and Networks.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Approximation Algorithms for NP-hard Combinatorial Problems Magnús M. Halldórsson Reykjavik University
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Querying Structured Text in an XML Database By Xuemei Luo.
Chapter 11 Heap. Overview ● The heap is a special type of binary tree. ● It may be used either as a priority queue or as a tool for sorting.
CSC 413/513: Intro to Algorithms NP Completeness.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
CSC 211 Data Structures Lecture 13
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
CSCI 3160 Design and Analysis of Algorithms Tutorial 10 Chengyu Lin.
Christopher Moh 2005 Competition Programming Analyzing and Solving problems.
CIKM Finding and Approximating Top-k Answers in Keyword Proximity Search Benny Kimelfeld Yehoshua Sagiv Benny Kimelfeld and Yehoshua Sagiv The Selim.
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
1 Chapter 34: NP-Completeness. 2 About this Tutorial What is NP ? How to check if a problem is in NP ? Cook-Levin Theorem Showing one of the most difficult.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
איך עונים על שאילתה, כשהתוצאה גדולה מאד? שרה כהן בית הספר להנדסה ולמדעי המחשב ע"ש רחל וסלים בנין ע"ש רחל וסלים בנין.
1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of.
Finding a Minimal Tree Pattern Under Neighborhood Constraints Benny Kimelfeld Yehoshua Sagiv IBM Research – AlmadenThe Hebrew University of Jerusalem 2011.
NP-Complete problems.
1 Ch. 2: Getting Started. 2 About this lecture Study a few simple algorithms for sorting – Insertion Sort – Selection Sort (Exercise) – Merge Sort Show.
ICS 353: Design and Analysis of Algorithms Backtracking King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.
CSC 421: Algorithm Design & Analysis
CSC 421: Algorithm Design & Analysis
CSC 421: Algorithm Design & Analysis
Computing Full Disjunctions
Chapter 5. Optimal Matchings
NP-Completeness Yin Tat Lee
1.3 Modeling with exponentially many constr.
1.3 Modeling with exponentially many constr.
CSC 421: Algorithm Design & Analysis
NP-Completeness Yin Tat Lee
Algorithms (2IL15) – Lecture 7
ICS 353: Design and Analysis of Algorithms
CSC 421: Algorithm Design & Analysis
Heaps & Multi-way Search Trees
Presentation transcript:

Enumerating Large Query Results Benny Kimelfeld IBM Almaden Research Center Sara Cohen The Hebrew University of Jerusalem Yehoshua Sagiv The Hebrew University of Jerusalem 25th International Conference on Data Engineering Shanghai, 2009 ICDE2009

Large Query Results ? timeRESULT = huge #answers Bad answers? Maybe a new query? ……… Can’t you be faster? Many answers!

Tutorial Goal In today’s world users are not willing to wait for answers –Online querying: provide some (“top-k”) results, the use paging for remaining results Previous work on returning top-k often do not guarantee: –Fast runtime –Best results In this talk: –Goal is not to present solutions to specific problems –Goal is to present general techniques for efficient (ranked) enumeration with guarantees Heuristics

OverviewIntroduction Lawler-Murty’s Ranked Enumeration Maximal Answers under Hereditary Properties Additional Techniques Summary & Concluding Remarks

Tractability of Enumeration x Yes | No y = opt{ z | property x (z) } x 1 bit usually, O(|x|) bits x a 1, a 2, a 3,…, a 2 |x|,…, a mOptimization algorithm Decision Enumeration Enumeration algorithm Efficient: polynomial time, linear time, log-space, … What is “efficient”?

Standard Notions of Tractability Combined complexity:Combined complexity: input = data + query  often, implies that any algorithm must be exponential in the worst case –That doesn’t help! What meaning can be given to the notion of efficient? What about special cases where the output happens to be small? Data complexity:Data complexity: fix the query; input = data  often, implies a poly. bound on the size of the output –But then, the core problem is missed: The output is no longer “huge” What else?

Tractability of Enumeration start time polynomial total time Running time is polynomial in input + output incremental polynomial time Delay before answer i is polynomial in input + i start time start time polynomial delay Delay between successive answers is poly(input) If answers are ranked, we prefer enumeration in ranked order ⇑ ⇑

Examples of Complexity Results start time polynomial total time Running time is polynomial in input + output incremental polynomial time Delay before answer i is polynomial in input + i start time start time polynomial delay Delay between successive answers is poly(input) Acyclic CQs Acyclic CQs [Yannakakis81] Acyclic CQs Acyclic CQs w/ monotonic ORDER BY [KS06] Not general CQs! [ChandraMerlin77] Full Disjunctions Full Disjunctions [KanzaS03] Full Disjunctions Full Disjunctions [CS05] Full Disjunctions Full Disjunctions [C. et al. 06] Max Cliques Max Cliques [Johnson et al. 88] Loopless paths by inc. weight Loopless paths by inc. weight [Yen71] Horn-clause solutions Horn-clause solutions [CreignouHebrard97]

Intuition: What’s the Problem? We need to create all the answers without needless repetition  That is, when we print an answer to the output, we need to validate that it hasn’t been previously printed Recursive algorithm executions can take an exponential time (many sub-answers which may lead to empty results, can’t wait that long…) When enumerating in ranked order, we cannot generate all answers and then sort  Otherwise, we get neither polynomial delay nor incremental polynomial time next 

OverviewIntroduction Lawler-Murty’s Ranked Enumeration Maximal Answers under Hereditary Properties Additional Techniques Summary & Concluding Remarks

Often, quite simple (not always!) Bottom Line: Lawler-Murty gives a general reduction: Enumeration in ranked order Optimization under constraints Find top answer under inclusion & exclusion constraints if poly. time then poly. delay

Problem Formulation O = A collection of objects A = score() score( a ) is high  a is a high-quality answer Huge, implicitly described by a constraint over O ’s subsets… Goal:Enumerate all the answers in ranked order Goal: Enumerate all the answers in ranked order … start Answers a ⊆ O (that is, by decreasing score) input Polynomial delay Required complexity: Polynomial delay Top-k Answers Special case:

Example 1: Graph Search O = A = … The nodes of the graph G Data graph G Set of keywords K Data graph G Set of keywords K Node sets a o f size |K| that contain all the keywords of K score( a ): 1 min size of a subtree containing a

Example 2: k-Best Perfect Matchings O = Edges of the graph G Weighted, bipartite graph G Matchings: Sets of edges—pairwise-disjoint & cover all nodes score( a ): A = … ∑ e ∈ ae ∈ a weight(e)

Example 3: Ranked Queries O = Mappings: (Query symbol → DB item) Database D Query Q Database D Query Q Matches a o f the query in the database IR / O RDER BY / … score( a ): A = …

What’s the Problem? O = 32 start 1 st (top) answer Optimization problem Assumption: Efficiently solvable 31 2 nd answer ? k th answer ≠ previous (k-1) answers best among remaining answers Conceivably, much more complicated than finding 1 st ? How to handle this constraint? Moreover, k may be very large!

Lawler-Murty’s Method A start [K. G. Murty, 1968] [E. L. Lawler, 1972]

1. Find & Print the Top Answer A start In principle, at this point we should find the second-best answer But Instead…

2. Partition the Remaining Answers A start simple constraints Each partition is defined by a distinct set of simple constraints

3. Find the Top of each Set A start

4. Find & Print the Second Answer A start Best among all the top answers in the partitions Next answer: Best among all the top answers in the partitions

5. Further Divide the Chosen Partition A start … and so on …...

A Partition is Defined by Constraints Two types of constraints: Inclusion constraint: “Must contain ” Exclusion constraint: “Must not contain ” A partition is defined by a set I ∪ E of inclusion and exclusion constraints Recall: Partition I ∪ E a How to further partition after removing a ? next 

EI Partitioning a Partition a = top(partition) ✗✗✗✗✗ ✓✓✓ Partition I ∪ E - { a }

EI Partitioning a Partition a = top(partition) ✗✗✗✗✗ ✓✓✓ Partition I ∪ E - { a } P 1 =( I 1, E 1 ) E I ✗✗✗✗✗ ✓✓✓ ✗

EI Partitioning a Partition a = top(partition) ✗✗✗✗✗ ✓✓✓ Partition I ∪ E - { a } P 1 =( I 1, E 1 ) E I ✗✗✗✗✗ ✓✓✓ ✗ P 2 =( I 2, E 2 ) E I ✗✗✗✗✗ ✓✓✓✓ ✗

EI Partitioning a Partition a = top(partition) ✗✗✗✗✗ ✓✓✓ Partition I ∪ E - { a } P 1 =( I 1, E 1 ) E I ✗✗✗✗✗ ✓✓✓ ✗ P 2 =( I 2, E 2 ) E I ✗✗✗✗✗ ✓✓✓✓ ✗ P 3 =( I 3, E 3 ) E I ✗✗✗✗✗ ✓✓✓✓ ✗ ✓

EI Partitioning a Partition a = top(partition) ✗✗✗✗✗ ✓✓✓ Partition I ∪ E - { a } P 1 =( I 1, E 1 ) E I ✗✗✗✗✗ ✓✓✓ ✗ P 2 =( I 2, E 2 ) E I ✗✗✗✗✗ ✓✓✓✓ ✗ P 4 =( I 4, E 4 ) E I ✗✗✗✗✗ ✓✓✓✓✓✓✓ P 3 =( I 3, E 3 ) E I ✗✗✗✗✗ ✓✓✓✓ ✗ ✓

EI Partitioning a Partition a = top(partition) ✗✗✗✗✗ ✓✓✓ Partition I ∪ E - { a } P 1 =( I 1, E 1 ) E I ✗✗✗✗✗ ✓✓✓ ✗ P 2 =( I 2, E 2 ) E I ✗✗✗✗✗ ✓✓✓✓ ✗ P 4 =( I 4, E 4 ) E I ✗✗✗✗✗ ✓✓✓✓✓✓✓ P 3 =( I 3, E 3 ) E I ✗✗✗✗✗ ✓✓✓✓ ✗ ✓ P 5 =( I 5, E 5 ) EI ✗✗✗✗✗ ✓✓✓✓✓✓✓ ✗

Complementary Details A partitioned is represented as a triple ( I, E, a ) –I and E are sets of inclusion and exclusion constraints, resp. (lists of objects); a is the top answer of the partition Current triples ( I, E, a ) are stored in a priority queue Q, prioritized by score( a ) –Initially, Q = { ( ∅,∅, a opt ) } In each iteration, The top triple ( I t, E t, a t ) is extracted from Q The answer a t is printed The new nonempty sub-partitions ( I i, E i, a i ) are inserted into Q … until Q is empty –Top-k: until k answers have been printed

Often, quite simple (not always!) Enumeration  Optimization In the bottom line, Lawler-Murty gives a reduction: Enumeration in ranked order Optimization under constraints Find top answer under inclusion & exclusion constraints if poly. time then poly. delay Example: Perfect matchings by decreasing weight 

Perfect Matchings by Dec. Weight edges have weights (not specified) Top matching: A perfect matching, such that the total sum of edge weights is maximal Efficiently solvable: Hungarian Algorithm (1955), Blossom Algorithm (1965), …

Max. Perfect Matching w/ Constrains ✗ ✓ ✓ edges have weights (not specified)

Handling Exclusion Constraints ✗ ✓ ✓ edges have weights (not specified) Excluded edges are simply removed!

Handling Inclusion Constraints ✓ ✓ edges have weights (not specified) Non-inclusion edges incident to nodes of inclusion edges are removed

That’s All! edges have weights (not specified) It is now the original problem (w/o constraints)! So, we can use Hungarian Algorithm, Blossom Algorithm, …  Perfect matchings by decreasing weight  Perfect matchings by decreasing weight

More: Keyword Proximity Search Lawler-Murty’s was used in [KS06] for solving the problem of keyword proximity search –Input: Data graph G, set of keywords K –Answers: Non-redundant subtrees of G that contain K –Score: 1/(total weight) In other words, “top-k Steiner trees” 2 problems: Opt. w/ constraints is NP-hard, even for 2 kw’s Solution: constraints are carefully constructed so that only tractable constraints are generated No bound on #kw’s → NP-hard even w/o constraints ▪ A bound on K is often reasonable (data complexity) ▪ Otherwise, approximations can be used

31 Ranked vs. Approximate Order If Then score( ) ≥ score( ) Ranked order start

31 Ranked vs. Approximate Order If Then score( ) ≥ score( ) If Then ≤ C score( ) Ranked order C-approximate order start

Generalized Lawler-Murty Lawler-Murty’s reduction can be generalized: Enumeration in a C-approximate ranked order Approximate Optimization Find a C-approximation of the top answer under inclusion & exclusion constraints if poly. time then poly. delay

OverviewIntroduction Lawler-Murty’s Ranked Enumeration Maximal Answers under Hereditary Properties Additional Techniques Summary & Concluding Remarks

Often, quite simple (not always!) Bottom Line In the bottom line, Hereditary Properties algorithm gives a reduction: Enumeration Input Restricted Enumeration if poly. time then poly. delay if inc. poly. time then inc. poly. time if poly. total time then poly. total time

Problem Formulation O = A collection of objects A =… Goal:Enumerate all the maximal answers efficiently Goal: Enumerate all the maximal answers efficiently Answers a ⊆ O input P = property: (1) polynomially verifiable, (2) hereditary or connected-hereditary Maximal subsets of O that satisfy the property P

Maximal Answers: Details Given P and O, a subset a of O is a maximal answer if: a satisfies P and 2 2. there is no additional object o that can be added to a while preserving P O = does not satisfy P satisfies P

Maximal Answers: Full Disjunctions = Generalization of Outer-Join Operator CountryClimate Canadadiverse Bahamastropical UKtemperate CountryCityHotelStars CanadaTorontoPlaza4 CanadaLondonRamada3 BahamasNassauHilton CountryCitySite CanadaLondonAir Show CanadaMouth Logan UKLondonBuckingham UKLondonHyde Park Climates Accommodations Sites P = “join consistent and connected” A = …

Types of Properties P is polynomially verifiable if we can check in polynomial time whether a set a satisfies P P is hereditary if a satisfies Pa’ satisfies P,  a’  a  Suppose there is a binary relationship defined over the objects (i.e., they are graph nodes) P is connected hereditary if Suppose there is a binary relationship defined over the objects (i.e., they are graph nodes) P is connected hereditary if a satisfies P a’ satisfies P,  connected a’  a  a is connected and

Examples of Properties Hereditary Properties Connected-hereditary Properties Is a Clique Is a Bipartite Matching Is a Forest Is a Tree Is Join Consistent and Connected Is Homomorphic to a Subtree of a given Labeled Tree Is 3-colorable Not polynomially verifiable

Problem Formulation O = A collection of objects A =… Goal:Enumerate all the maximal answers efficiently Goal: Enumerate all the maximal answers efficiently Answers a ⊆ O input P = property: (1) polynomially verifiable, (2) hereditary or connected-hereditary Maximal subsets of O that satisfy the property P

Example 1: Full Disjunctions O = A = … The tuples of D from the relations in Q FD Query Q Database D FD Query Q Database D Maximal sets a o f join consistent and connected tuples P = “join consistent and connected”

Example 2: Maximal Tree Answers O = A = … The nodes of X Tree Query Q XML doc X Tree Query Q XML doc X Maximal sets a o f nodes, such that a induces a subtree homomorphic to a subtree of Q P = “homomorphic to a subtree of Q”

Example 3: Maximal Bipartite Matchings O = A = … The edges of G Bipartite graph G Maximal matchings P = “is a matching”

Example 4: Maximal Cliques O = A = … The nodes of G Graph G Maximal cliques P = “is a clique”

Strategy Recall: –Lawler-Murty’s reduced the enumeration problem to an optimization problem –Runtime: polynomial delay if the optimization is in polynomial time For this problem: –Reduce enumeration problem to a restricted version –Runtime depends on that of the restricted enumeration problem

The Restricted Version O = A collection of objects A =… Goal:Enumerate all the maximal answers efficiently Goal: Enumerate all the maximal answers efficiently Answers a ⊆ O input P = property: (1) polynomially verifiable, (2) hereditary or connected-hereditary Maximal subsets of O that satisfy the property P A collection of objects that almost satisfies P

Almost Satisfies O almost satisfies P if there is an object o ∈ O, such that O - { o } satisfies P O = ✓

Example: Maximal Bipartite Matchings This set of edges almost satisfies “is a bipartite matching” Enumeration problem: Find all maximal bipartite matchings from this set of edges

Example: Maximal Bipartite Matchings One maximal bipartite matching

Example: Maximal Bipartite Matchings Another maximal bipartite matching For the restricted enumeration problem there are always at most 2 answers, and they can be found in polynomial time

Reduction Complexity Results Given an algorithm that solves the restricted version, we will show an algorithm that solves the general (unrestricted) version Complexity of Restricted Version Complexity of Unrestricted Version Poly. Total Time Inc. Poly. Time PolynomialPoly. Delay

Our Method start O [CKS, To appear]

1. Find & Print & Store a Maximal Answer start One answer can always be found in polynomial time But Instead… O Now we should look for another answer

Add to set of items found to create a restricted version of the problem U 2. For each remaining object not in set start O

U 3. Enumerate Solutions to Restricted Problem start O Enumerate solutions for U (This is the reduction) start

U 4. For each Solution to Restricted Problem start O Maximally extend to get an answer to the original problem (extending is always polynomial) start … and so on … Continue to enumerate answers to Continue to enumerate answers to U…. Continue to add other nodes from to form new sets for Continue to add other nodes from O to form new sets for U

Some More Details Each answer generated must be stored, so that we do not repeat answers –Use an index structure Printing actually happens at different points depending on the parity of the level of recursion –This allows for polynomial delay Memory efficient versions –For hereditary properties we have a memory efficient version

DB Problems for which this is Useful In the context of incomplete information –Look for maximal answers, not complete answers Full disjunctions in poly delay –A generalization of the outer-join to any number of relations Maximal matches to tree queries in poly delay

Ranked Order Algorithm returns answers in arbitrary order Cannot return answers with more objects first –Ranking function is number of objects in set Famous result on NP-hardness of node-deletion for hereditary and connected-hereditary properties [Lewis, Yannakakis, STOC 78] Can return in ranked order for monotonically c- determined ranking functions (details omitted) Question: Can we return answers efficiently in ranked order? Answer: In general, no

Often, quite simple (not always!) Enumeration  Restricted Enumeration In the bottom line, Hereditary Properties algorithm gives a reduction: Enumeration Input Restricted Enumeration if poly. time then poly. delay if inc. poly. time then inc. poly. time if poly. total time then poly. total time

OverviewIntroduction Lawler-Murty’s Ranked Enumeration Maximal Answers under Hereditary Properties Additional Techniques Summary & Concluding Remarks

Often, quite simple (not always!) Bottom Line In the bottom line, technique shown next gives a reduction: Enumeration Decision of Non-emptiness under constraints if poly. time then poly. delay

Recursive Partition of the Output O = A collection of objects A =… Answers a ⊆ O input Enumeration Algorithm: Choose an object o ∈ O Enumerate all the answers that contain o Enumerate all the answers that do not contain o ✓ ✗ We need an algorithm for a generalized problem: ?

Generalized Enumeration Enumerate( I, E ): If I ∪ E = O, then print( I ) and return; otherwise: Choose an object o ∈ O – ( I ∪ E ) If ≥1 answers satisfy I ∪ { o }, E  E Enumerate( I ∪ { o }, E ) If ≥1 answers satisfy I, E ∪ { o }  E Enumerate( I, E ∪ { o } ) E I ✗✗✗✗✗ ✓✓✓ O = { } Goal: Enumerate all the answers a, s.t. I ⊆ a and a ∩ E = ∅ Can be empty! Exponential delay! I & E are satisfiable!

Reduction: Enumeration  Non-emptiness Enumerate( I, E ): If I ∪ E = O, then print( I ) and return; otherwise: Choose an object o ∈ O – ( I ∪ E ) If ≥1 answers satisfy I ∪ { o }, E  Enumerate( I ∪ { o }, E ) If ≥1 answers satisfy I, E ∪ { o }  Enumerate( I, E ∪ { o } ) Poly. time? Polynomial delay! In the bottom line, we get a reduction: Enumerate A with polynomial delay Decide if an answer that satisfies I and E exists Often, o should be carefully chosen…

Often, quite simple (not always!) Enumeration  Decision In the bottom line, technique gives a reduction: Enumeration Decision of Non-emptiness under constraints if poly. time then poly. delay

Comparison with Lawler-Murty Recursive Partition Lawler-Murty Enumeration in ranked order No order (except for very specific cases) Polynomial delay (usually shorter delay!!) Reduces to optimization under constraints Reduces to nonemptiness under constraints Space cost can be linear in the output possibly exp(input) PSPACE

Recursion into Sub-Problems O = A collection of objects input

Recursion into Sub-Problems O = A collection of objects input exp(input) Problem! Problem! ??

Iterators over Poly-Delay Algorithms Recursive calls enumerate many sub-answersProblem: Recursive calls enumerate many sub-answers (which are not final answers) –We cannot let the recursive method call terminate! Idea: Enumeration algorithm as an iterator (also called a co-routine) –iterator.first(): Start the execution until the first answer is generated, and yield –iterator.next(): Resume execution from the last output, until next output (or termination); then yield Now, instead of recursive method calls, use recursive iterators …

Sub-Problems + Iterators O = A collection of objects input Iterator 1Iterator 2 first() next()

Past Uses of Techniques Keyword Proximity Search: Unranked enumeration [KS05] –Used both techniques discussed –Can be combined with heuristics to get efficient heuristically ranked enumeration Full disjunctions: [C. et al. 06] –Used iterators Maximally joining probabilistic relations: [KS07] –Recursive partition (in a non-trivial fashion)

OverviewIntroduction Lawler-Murty’s Ranked Enumeration Maximal Answers under Hereditary Properties Additional Techniques Summary & Concluding Remarks

Summary Complexity classes –Polynomial total time –Incremental polynomial time –Polynomial delay

Summary General frameworks for solving enumeration problems –Lawler-Murty: Reduction to optimization –Hereditary properties: Reduction to special case of enumeration –Recursive partition: Reduction to decision problem –Iterators

In theory, theory and practice are the same. In practice, they are not.

These have been implemented and work! ? time Examples: Full disjunctions, Keyword proximity search (approximate, ranked)

Conclusion: Take Home Message 1 Analyze enumeration problems using complexity classes appropriate for enumeration

Conclusion: Take Home Message 2 Frameworks presented may be usable for your problem – plug and play –Allows you to focus on solving “standard” types of problems

Thank you! Questions?