ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv.

Slides:



Advertisements
Similar presentations
Analysis of Algorithms
Advertisements

Automatic Verification Book: Chapter 6. How can we check the model? The model is a graph. The specification should refer the the graph representation.
A Domain Level Personalization Technique A. Campi, M. Mazuran, S. Ronchi.
Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.
Introduction to Algorithms NP-Complete
Computing Structural Similarity of Source XML Schemas against Domain XML Schema Jianxin Li 1 Chengfei Liu 1 Jeffrey Xu Yu 2 Jixue Liu 3 Guoren Wang 4 Chi.
COMP 482: Design and Analysis of Algorithms
Complexity Classes: P and NP
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
Tutorial 6 of CSCI2110 Bipartite Matching Tutor: Zhou Hong ( 周宏 )
Train DEPOT PROBLEM USING PERMUTATION GRAPHS
Enumerating Large Query Results Benny Kimelfeld IBM Almaden Research Center Sara Cohen The Hebrew University of Jerusalem Yehoshua Sagiv The Hebrew University.
CSC5160 Topics in Algorithms Tutorial 2 Introduction to NP-Complete Problems Feb Jerry Le
Complexity 11-1 Complexity Andrei Bulatov NP-Completeness.
CSE332: Data Abstractions Lecture 27: A Few Words on NP Dan Grossman Spring 2010.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Computability and Complexity 15-1 Computability and Complexity Andrei Bulatov NP-Completeness.
Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation J. Gramm, J. Guo, F. Hüffner and R. Niedermeier Theory of Computing Systems.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 21 Instructor: Paul Beame.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
The Theory of NP-Completeness
“IBM Research Report A faster Exponential-Time Algorithm for Max 2-Sat, Max Cut, and Max k- Cut”, Alexander D. Scott, Gregory B. Sorkin, IBM Research Division.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Clique Cover Cook’s Theorem 3SAT and Independent Set
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Shortest Path Calculations in Graphs Prof. S. M. Lee Department of Computer Science.
Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University.
Combining Exact and Metaheuristic Techniques For Learning Extended Finite-State Machines From Test Scenarios and Temporal Properties ICMLA ’14 December.
May 5, 2015Applied Discrete Mathematics Week 13: Boolean Algebra 1 Dijkstra’s Algorithm procedure Dijkstra(G: weighted connected simple graph with vertices.
Tractable Symmetry Breaking Using Restricted Search Trees Colva M. Roney-Dougal, Ian P. Gent, Tom Kelsey, Steve Linton Presented by: Shant Karakashian.
Fixed Parameter Complexity Algorithms and Networks.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness.
Lecture 22 More NPC problems
Querying Structured Text in an XML Database By Xuemei Luo.
Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.
CSE332: Data Abstractions Lecture 24.5: Interlude on Intractability Dan Grossman Spring 2012.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
COSC 2007 Data Structures II Chapter 14 Graphs I.
1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of.
NP-Complete problems.
Topics Paths and Circuits (11.2) A B C D E F G.
NPC.
Graph Indexing From managing and mining graph data.
CSCI 2670 Introduction to Theory of Computing December 7, 2005.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Incomplete Answers over Semistructured Data Kanza, Nutt, Sagiv PODS 1999 Slides by Yaron Kanza.
Domain Name System: DNS To identify an entity, TCP/IP protocols use the IP address, which uniquely identifies the Connection of a host to the Internet.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Lecture 2-2 NP Class.
NP-Completeness (2) NP-Completeness Graphs 7/23/ :02 PM x x x x
NP-Completeness Proofs
Hard Problems Introduction to NP
Computing Full Disjunctions
Associative Query Answering via Query Feature Similarity
NP-Completeness Yin Tat Lee
On Efficient Graph Substructure Selection
ICS 353: Design and Analysis of Algorithms
Structure and Content Scoring for XML
NP-Complete Problems.
NP-Completeness Yin Tat Lee
Structure and Content Scoring for XML
Trevor Brown DC 2338, Office hour M3-4pm
Lecture 24 Vertex Cover and Hamiltonian Cycle
Presentation transcript:

ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005 Motivation Queries and Databases Answers and Semantics Graph Properties

ICDT 2005 The Problem In many different domains, we are given the option to query some source of information Usually, the user only gets results if the query can be completely answered (satisfied) In many domains, this is not appropriate, e.g., The user is not familiar with the database The database does not contain complete information There is a mismatch between the ontology of the user and that of the database The query is a search that is not expected to be correct

ICDT 2005 Search for papers by Smith that appeared in ICDT 2004

ICDT 2005 Sorry, no matching record found

ICDT 2005 Search for buses from Haifa-Technion to Ben Gurion Airport

ICDT 2005 There is no direct bus line between the required destinations

ICDT 2005 Search for buses to Ben Gurion Airport

ICDT 2005 Must choose From and To

ICDT 2005 What Do Users Need? Users need a way to get interesting partial answers to their queries, especially if a complete answer does not exist These partial answers should contain maximal information Main Problems: What should be the semantics of partial answers? How can all partial answers be efficiently computed?

ICDT 2005 Previous Work Many solutions have been given for the main problems solutions differ, according to the problem domain Examples: Full disjunctions: Galindo-Legaria (94), Rajaraman, Ullman (96), Kanza, Sagiv (03) Queries with incomplete answers over semistructured data: Kanza, Nutt, Sagiv (99) FleXPath: Amer-Yahia, Lakshmanan, Pandit (04) Interconnections: Cohen, Kanza, Sagiv (03)

ICDT 2005 Our Contribution In the past, for each semantics considered, the query evaluation problem had to be studied anew. In this paper, we: Present a general framework for defining semantics for partial answers Framework is general enough to cover most previously studied semantics Query evaluation problem can be solved once within this framework – and reused for new semantics Results improve upon previous evaluation algorithms Presents relationship between this problem and that of the maximal P-subgraph problem

ICDT 2005 Motivation Queries and Databases Answers and Semantics Graph Properties

ICDT 2005 Databases Databases are modeled as data graphs: (V, E, r, l V, l E ) r: Can have a designated root l V : Labels on the vertices l E : Labels on the edges Note: Nodes correspond to data items Even databases that do not have an inherent graph structure can be modeled as graphs, e.g., relational databases

ICDT 2005 XML as a Data Graph Technion University Name Dept Name Faculty Name Faculty Professor Name Teaches Lecturer Name Teaches Computer Science Chana Israeli Databases Bioinformatics Avi Levy Biology Molecular Biology

ICDT 2005 Relational Database as a Data Graph Country Climate Canadadiverse UKtemporate USAtemporate Country City Hotel UKLondonPlaza CanadaMontrealHitlon CanadaTorontoRamada Country City Site UKLondonBuckingham USANYMetropolitan Climates Sites Accommodations

ICDT 2005 Relational Database as a Data Graph Country Climate Canadadiverse UKtemporate USAtemporate Country City Hotel UKLondonPlaza CanadaMontrealHitlon CanadaTorontoRamada Country City Site UKLondonBuckingham USANYMetropolitan Climates Sites Accommodations (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate))

ICDT 2005 Relational Database as a Data Graph Country City Hotel UKLondonPlaza CanadaMontrealHitlon CanadaTorontoRamada Country City Site UKLondonBuckingham USANYMetropolitan Sites Accommodations (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada))

ICDT 2005 Relational Database as a Data Graph Country City Site UKLondonBuckingham USANYMetropolitan Sites (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan))

ICDT 2005 Relational Database as a Data Graph (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan))

ICDT 2005 Queries Queries are modeled as query graphs: (V, E, r, C V, C E, s) r: Can have a designated root C V : Vertex constraints on the vertices (basically, a boolean function on vertices) C E : Edge constraints on the edges (basically, a boolean function on pairs of vertices) s: A structural constraint, one of the letters C, R, N (defines the required structure of answers, i.e., connected, rooted or none) Note: Nodes correspond to query variables

ICDT 2005 = Dept and ContainsText(Biology) XML Query as a Graph Returns faculty members from the Biology Department = University = Faculty = Name Is Descendent Is GrandChild Is Child Structural Constraint: Rooted

ICDT 2005 Join Query as a Graph C A S Belongs to: C Belongs to: ABelongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q2q2 q3q3 Structural Constraint: Connected

ICDT 2005 Motivation Queries and Databases Answers and Semantics Graph Properties

ICDT 2005 Assignment Graphs Assignment graphs are used to compactly represent assignments of query nodes to database nodes Basically, assignment graph for Q and D, written Q D has: Node (q,d) for each pair q Q and d D such that d satisfies the constraint on q Edge ((q,d), (q,d)) if there is an edge (q,q) in Q and (d,d) satisfies the constraint on (q,q) May also have a root (details omitted)

ICDT 2005 Belongs to: A q2q2 Belongs to: C Belongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q3q3 (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan)) c1c1 c2c2 c3c3 a1a1 a2a2 s1s1 s2s2 a3a3 (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 )

ICDT 2005 Belongs to: A q2q2 Belongs to: C Belongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q3q3 (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan)) c1c1 c2c2 c3c3 a1a1 a2a2 s1s1 s2s2 a3a3 (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 )

ICDT 2005 Belongs to: A q2q2 Belongs to: C Belongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q3q3 (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan)) c1c1 c2c2 c3c3 a1a1 a2a2 s1s1 s2s2 a3a3 (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 )

ICDT 2005 Belongs to: A q2q2 Belongs to: C Belongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q3q3 (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan)) c1c1 c2c2 c3c3 a1a1 a2a2 s1s1 s2s2 a3a3 (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 )

ICDT 2005 Belongs to: A q2q2 Belongs to: C Belongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q3q3 (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan)) c1c1 c2c2 c3c3 a1a1 a2a2 s1s1 s2s2 a3a3 (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 )

ICDT 2005 Belongs to: A q2q2 Belongs to: C Belongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q3q3 (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan)) c1c1 c2c2 c3c3 a1a1 a2a2 s1s1 s2s2 a3a3 (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 )

ICDT 2005 Partial Assignment A partial assignment is any subgraph of Q D that does not contain two different nodes (q,d) and (q,d) otherwise, would map the node q to two different database nodes Can distinguish special types of partial assignments: vertex complete edge complete structurally consistent Every query node must appear in the partial assignment Every edge constraint between query variables in the partial assignment holds The partial assignment satisfies the querys structural constraint

ICDT 2005 Vertex Complete, Edge Complete, Structurally Consistent Vertex Complete, Edge Complete, Structurally Consistent Vertex Complete, Edge Complete, Structurally Consistent Example (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 )

ICDT 2005 Semantics All partial assignments for Q over D that satisfy the vertex and edge constraints are encoded in Q D A semantics defines which subgraphs of the answer graph (i.e., which partial assignments) are in fact answers, e.g., S ves allows all partial assignments that are vertex complete, edge complete and structurally consistent S es allows all partial assignments that are edge complete and structurally consistent S s allows all partial assignments that are structurally consistent Usually, we are only interested in maximal partial assignemnts

ICDT 2005 Example: Join (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 ) Using semantics S ves we get the natural join

ICDT 2005 Example: Join becomes a Full Disjunction (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 ) Using semantics S es we get the full disjunction

ICDT 2005 Other Examples Queries with incomplete answers over semistructured data: Kanza, Nutt, Sagiv (PODS 99) Weak semantics modeled by S es ; Or-semantics modeled by S s FleXPath: Amer-Yahia, Lakshmanan, Pandit (Sigmond 04) Modeled by S es Interconnections: Cohen, Kanza, Sagiv (03) Complete interconnection can be modeled by S es ; Reachable interconnection can be modeled by S s

ICDT 2005 Motivation Queries and Databases Answers and Semantics Graph Properties

ICDT 2005 Semantics are a type of Graph Property A graph property P is a set of graphs, e.g., is a clique is a bipartite graph A semantics defines a set of graphs, for every Q, D (these graphs are subgraphs of Q D) Therefore, semantics are a type of graph property

ICDT 2005 Hereditary Graph Properties and their Variants There are several interesting types of graph properties that have been studied in graph theory A graph property P is hereditary if every induced subgraph of a graph in P, is also in P (e.g., clique, is a forest) A graph property P is connected-hereditary if every connected induced subgraph of a graph in P, is also in P (e.g., is a tree) Can define rooted-hereditary similarly

ICDT 2005 Semantics are usually Hereditary Most semantics for partial answers considered in the past are hereditary (in some sense), i.e., subgraphs of a partial answer are also partial answers Many semantics require connectivity of results (e.g., full disjunctions) Some require answers to be rooted (e.g., FlexPath)

ICDT 2005 Maximal P-Subgraph Problem Given a graph property P, and a graph G The maximal P-subgraph problem is: Find all maximal induced subgraphs of G that have property P Therefore, the problem of finding all maximal answers for a query over a database, under a given semantics, is a special case of the maximal P-subgraph problem

ICDT 2005 Efficient Query Evaluation There are efficient algorithms that find all maximal P-subgraphs for hereditary, connected hereditary and rooted hereditary properties Efficient in terms of the input and the output (i.e., incremental polynomial time) Use these algorithms to find maximal query answers, e.g., to find full disjunctions, weak answers, or-answers, etc. Improves upon previous results

ICDT 2005 Conclusion Presented abstract framework Can model many different types of queries, databases and semantics in the framework Semantics in the framework are graph properties Solve the maximal P-subgraph problem once and reuse it to find maximal query answers

ICDT 2005 Future Work It is convenient to define ranking functions and return answers in ranking order How/when can this be done in our framework? Note: From the modeling it is immediately apparent that ranking cannot always be performed efficiently The problem of finding a maximal P-subgraph of size k is NP complete for hereditary and connected- hereditary graph properties (Yannakakis, STOC 78)

ICDT 2005 Thank you! Questions?