Computing Full Disjunctions

Slides:



Advertisements
Similar presentations
ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv.
Advertisements

Convert ER to Relational Database Entity relation Entity relation Attributes attributes Attributes attributes Primary key primary key Primary key primary.
CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of.
Query Folding Xiaolei Qian Presented by Ram Kumar Vangala.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
Relational Algebra 1 Chapter 5.1 V3.0 Napier University Dr Gordon Russell.
Complexity 11-1 Complexity Andrei Bulatov NP-Completeness.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Hardness Results for Problems P: Class of “easy to solve” problems Absolute hardness results Relative hardness results –Reduction technique.
1 Oblivious Querying of Data with Irregular Structure.
Merging Models Based on Given Correspondences Rachel A. Pottinger Philip A. Bernstein.
Computability and Complexity 15-1 Computability and Complexity Andrei Bulatov NP-Completeness.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
1 8. Safe Query Languages Safe program – its semantics can be at least partially computed on any valid database input. Safety is tied to program verification,
TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,
1 Brief Announcement: Distributed Broadcasting and Mapping Protocols in Directed Anonymous Networks Michael Langberg: Open University of Israel Moshe Schwartz:
1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Full Disjunctions: Polynomial-Delay Iterators in Action Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University.
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
Querying Structured Text in an XML Database By Xuemei Luo.
Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.
©Silberschatz, Korth and Sudarshan2.1Database System Concepts Chapter 2: Entity-Relationship Model Entity Sets Relationship Sets Design Issues Mapping.
Dimitrios Skoutas Alkis Simitsis
Database Systems Part VII: XML Querying Software School of Hunan University
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of.
Presented By: Miss N. Nembhard. Relation Algebra Relational Algebra is : the formal description of how a relational database operates the mathematics.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Incomplete Answers over Semistructured Data Kanza, Nutt, Sagiv PODS 1999 Slides by Yaron Kanza.
COP Introduction to Database Structures
Directed Graphs 12/7/2017 7:15 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
Chapter (6) The Relational Algebra and Relational Calculus Objectives
Entity-Relationship Model
Entity-Relationship Model
Chapter 2: Entity-Relationship Model
Michael Langberg: Open University of Israel
Answering Queries using Templates with Binding Patterns
Chapter 7: Entity-Relationship Model
Chapter 2: Intro to Relational Model
Directed Graphs 9/20/2018 1:45 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
Associative Query Answering via Query Feature Similarity
NP-Completeness Yin Tat Lee
1.3 Modeling with exponentially many constr.
ICS 353: Design and Analysis of Algorithms
Module 8 – Database Design Using the E-R Model
SAT-Based Area Recovery in Technology Mapping
Alternating tree Automata and Parity games
The Relational Algebra
Session 2 Welcome: The seventh learning sequence
NP-Complete Problems.
Early Profile Pruning on XML-aware Publish-Subscribe Systems
1.3 Modeling with exponentially many constr.
Directed Graphs Directed Graphs Directed Graphs 2/23/ :12 AM BOS
The Relational Algebra
NP-Completeness Yin Tat Lee
Chapter 7: Entity-Relationship Model
On the Graph Decomposition
CSC 380: Design and Analysis of Algorithms
Switching Lemmas and Proof Complexity
Unit Relational Algebra 1
Chapter 6b: Database Design Using the E-R Model
Presentation transcript:

Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of Jerusalem

Overview of the Talk OR-semantics and weak semantics for querying incomplete data Complexity of query evaluation Full disjunctions as a special case of weak semantics Generalizing full disjunctions – the join constraints are not restricted to be equality constraints Lower bounds for some related problems

Querying Incomplete Data Requires a Special Semantics Usually, answers to a query are complete assignments of database objects (or values) to the query variables Consequently, partial information is lost For example, dangling tuples are lost when joining several relations The purpose of outerjoins and full disjunctions is to solve this problem, i.e., answers could be partial assignments (to some of the variables)

Querying Incomplete Semistructured Data In semistructured data, incompleteness of data is prevalent OR-semantics and weak semantics were introduced so that queries over semistructured data would return maximal answers rather than complete answers [Kanza, Nutt & Sagiv 1999]

In the Semistructured Data Model Both data and queries are labeled rooted directed graphs Query nodes are variables Database nodes are objects Matchings are assignments of database objects to query variables, such that The database root is assigned to the query root, and Labels are preserved

A Semistructured Database About Movies 1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director acted in acted in A Semistructured Database About Movies

Under complete semantics, the query A Query v1 movie actor title director v3 name v2 w3 w1 language date of birth w4 w2 acted in Under complete semantics, the query returns actor-movie pairs, such that the actor played in the movie and was also the director of the movie

A complete matching of the query variables to database objects movie 1 1 movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen language 11 11 9 1/12/1935 7 6 6 1998 1983 English director acted in v1 movie actor acted in title director v3 name v2 w3 A complete matching of the query variables to database objects w1 language date of birth w4 w2 acted in

Constraints on Complete Matchings The root constraint is satisfied if the query root is mapped to the database root A query edge is an edge constraint: A query edge with a label l is satisfied if it is mapped to a database edge with the same label l Query Root Database Root r 1 x y 9 11 l

Suppose that Node 6 is missing movie actor movie title title name 1 movie actor movie 2 3 4 title title name language 6 English 6 English 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 1998 1983 director acted in Suppose that Node 6 is missing acted in

An incomplete matching This matching is maximal null movie actor movie 1 1 movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen 11 11 9 1/12/1935 7 1998 1983 director acted in v1 movie actor acted in An incomplete matching title director v3 name v2 w3 w1 language date of birth This matching is maximal w4 w2 null w2 acted in

The Reachability Constraint on Partial Matchings A query node v that is mapped to a database object o satisfies the reachability constraint if there is a path from the query root to v, such that all edge constraints along this path are satisfied Database 1 w y l1 r v l3 l5 1 55 5 8 x z r l2 l4 l6 7 9 1 x z w y l1 r v l3 l2 l5 l4 l6 Query 55

Weak Satisfaction of Edge Constraints An edge constraint is weakly satisfied if it is either Satisfied (as defined earlier), or One (or more) of its nodes is mapped to a null value x y 9 11 l m null x y 9 11 l

Weak Matchings A partial matching is a weak matching if The root constraint is satisfied The reachability constraint is satisfied by every query node that is mapped to a database node Every edge constraint is weakly satisfied

A weak matching null movie actor movie title title name year 1 1 movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen 11 11 9 1/12/1935 7 1998 1983 director acted in v1 movie actor acted in title director v3 name v2 A weak matching w3 w1 language date of birth w4 w2 w2 acted in null

A Movie Database Consider the case where the director edge is missing 1 movie actor movie 2 3 4 title title name director director 5 8 year date of birth 10 Zelig Antz year Woody Allen 11 9 1/12/1935 7 1998 1983 acted in acted in A Movie Database Consider the case where the director edge is missing

An incomplete matching that is not a weak matching 1 1 movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen 11 11 9 1/12/1935 7 1998 1983 acted in v1 movie actor acted in An incomplete matching that is not a weak matching title There is an edge that is not weakly satisfied director v3 name v2 w3 w1 language date of birth w4 w2 w2 acted in null

OR Matchings A partial matching is an OR matching if The root constraint is satisfied The reachability constraint is satisfied by every query node that is mapped to a database node Differently from a weak matching, in an OR Matching, an edge constraint does not have to be weakly satisfied

Maximal Matchings Matchings can be represented as tuples (where numbers are object id’s) A matching t1 subsumes a matching t2 if t1 can be obtained from t2 by replacing some nulls in t2 with non-null values A matching is maximal if no other matching subsumes it A query result consists only of maximal matchings t1=(1, 5, 2, null) t2=(1, null, 2, null)

More Examples

The Movie Database Before the Removals 1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director acted in acted in The Movie Database Before the Removals

the actor must be both an actor in the movie and 1 1 In the result, the actor must be both an actor in the movie and the director of the movie movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen language 11 11 9 1/12/1935 7 6 6 1998 1983 English director acted in v1 movie actor acted in title director v3 name v2 w3 w1 language date of birth A complete matching It is also a maximal OR-matching It is also a maximal weak matching w4 w2 acted in

In the result, if the actor and the 1 1 In the result, if the actor and the movie are assigned non-null values, then the actor must be both an actor in the movie and the director of the movie movie actor movie 2 3 3 4 title title name 5 8 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director acted in v1 movie actor acted in null title director v3 name v2 w3 w1 null language date of birth A second maximal weak matching w4 w2 acted in null null

the actor either played in the movie, 1 1 In the result, the actor either played in the movie, directed the movie, or is not related at all to the movie movie actor movie 2 3 4 3 4 title title name 5 8 8 year date of birth 10 10 Zelig Antz year Woody Allen language 11 11 9 1/12/1935 7 6 1998 1983 English director acted in v1 movie actor acted in title It is not a weak matching director v3 name v2 w3 w1 language date of birth A maximal OR-matching w4 w2 acted in null

Complexity of Evaluating Maximal Weak Matchings and Maximal OR Matchings

Data Complexity Under data complexity, the time complexity is a function of the size of the database

Two Alternatives for Query Evaluation A naïve algorithm computes all matchings and then removes subsumed matchings A better algorithm avoids computing all matchings – ideally it only computes maximal matchings Under data complexity, both algorithms are polynomial time

Input-Output Complexity Under input-output complexity, the time complexity is a function of the size of the query, the size of the database, and the size of the result

A Naïve Algorithm vs. A Better Algorithm Under I-O complexity, a naïve algorithm is exponential Is there a better algorithm with a polynomial time I-O complexity? The answer is positive for DAG queries [Kanza, Nutt & Sagiv 1999]

Cyclic Queries Theorem: For a query Q and a database D, the set of all maximal weak matchings can be computed in O(q3dm2) time, where q is the size of the query, d is the size of the database and m is the size of the result (computing all maximal OR matchings has the same complexity)

What is the full disjunction of a set of relations? Full Disjunctions What is the full disjunction of a set of relations? How are full disjunctions related to queries with incomplete answers?

Actors-that-Directed The Full Disjunction of the Given Relations Movies Actors language year title m-id English 1983 Zelig 1 1998 Antz 2 Armageddon 3 1940 Fantasia 4 date-of-birth name a-id 1/12/1935 Woody Allen 1 19/3/1955 Bruce Willis 2 28/10/1967 Julia Roberts 3 Acted-in role m-id a-id Zelig 1 Z 2 Harry 3 Actors-that-Directed m-id a-id 1 The Full Disjunction of the Given Relations role Date-of-birth name a-id language year title m-id Zelig 1/12/1935 Woody Allen 1 English 1983 Z 1998 Antz 2 Harry 19/3/1955 Bruce Willis Armageddon 3  1940 Fantasia 4 28/10/1967 Julia Roberts

The Full Disjunction of the Given Relations Movies language year title m-id English 1983 Zelig 1 1998 Antz 2 Armageddon 3 1940 Fantasia 4 This tuple will not be in the full disjunction role Date-of-birth name a-id language year title m-id  English 1983 Zelig 1 The Full Disjunction of the Given Relations role Date-of-birth name a-id language year title m-id Zelig 1/12/1935 Woody Allen 1 English 1983 Z 1998 Antz 2 Harry 19/3/1955 Bruce Willis Armageddon 3  1940 Fantasia 4 28/10/1967 Julia Roberts The full disjunction does not include subsumed tuples

Actors-that-Directed The Full Disjunction of the Given Relations Movies Actors language year title m-id English 1983 Zelig 1 1998 Antz 2 Armageddon 3 1940 Fantasia 4 date-of-birth name a-id 1/12/1935 Woody Allen 1 19/3/1955 Bruce Willis 2 28/10/1967 Julia Roberts 3 Acted-in role m-id a-id Zelig 1 Z 2 Harry 3 Actors-that-Directed m-id a-id 1 The Full Disjunction of the Given Relations role Date-of-birth name a-id language year title m-id Zelig 1/12/1935 Woody Allen 1 English 1983 Z 1998 Antz 2 Harry 19/3/1955 Bruce Willis Armageddon 3  1940 Fantasia 4 28/10/1967 Julia Roberts role Date-of-birth name a-id language year title m-id  28/10/1967 Julia Roberts 3 English 1940 Fantasia 4 The full disjunction does not include tuples that are based on Cartesian Product rather than join

In the Full Disjunction of a Given Set of Relations: Every tuple of the input is a part of at least one tuple of the output Tuples are joined as in a natural join, padded with null values The result includes only “maximal connected portions”

Motivation for Full Disjunctions Full disjunctions have been proposed by Galiando-Legaria as an alternative for outerjoins [SIGMOD’94] Rajaraman and Ullman suggested to use full disjunctions for information integration [PODS’96]

Computing Full Disjunctions for γ-acyclic Relation Schemas Rajaraman and Ullman have shown how to evaluate the full disjunction by a sequence of natural outerjoins when the relation schemas are γ-acyclic Hence, the full disjunction can be computed in polynomial time, under input-output complexity, when the relation schemas are γ-acyclic

Weak Semantics Generalizes Full Disjunctions Relations can be converted into a semistructured database The full disjunction can be expressed as the union of several queries that are evaluated under weak semantics

We use colors instead of labels Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 A node is created for each tuple Edges are added between connected tuples, in both directions A root is added, and edges are added from the root to every node We use colors instead of labels Creating The Database

Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 A node is created for each relation schema Edges are added between connected schemas, in both directions The number of queries is equal to the number of schemas In each query, the root is connected to a different schema r Movies Actors Acted-in Creating The Queries

Queries are Evaluated under Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id role name a-id title m-id Zelig Woody Allen 1 Acted-in Actors Movies Queries are Evaluated under Weak Semantics

Queries are Evaluated under Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 role name a-id title m-id Zelig Woody Allen 1 Acted-in Actors Movies Queries are Evaluated under Weak Semantics

Queries are Evaluated under Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 role name a-id title m-id Zelig Woody Allen 1 Acted-in Actors Movies Queries are Evaluated under Weak Semantics

Queries are Evaluated under Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3  Julia Roberts null Acted-in null Actors Movies Queries are Evaluated under Weak Semantics

Queries are Evaluated under Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3  Julia Roberts role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 Acted-in Actors Movies Queries are Evaluated under Weak Semantics

Example r Actors Acted-in Movies r Acted-in Actors Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3  Julia Roberts Fantasia 4 role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3  Julia Roberts role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 r null Acted-in null Actors Movies

The Algorithm Computes Full Disjunctions in Polynomial Time Under Input-Output Complexity Theorem: The full disjunction of relations r1, …, rn can be computed in O(n5s 2f 2) time, where n is the number of relations, s is the total size of all the relations and f is the size of the result

Generalizing Full Disjunctions In a full disjunction, tuples are joined according to equality constraints as in a natural join (or equi-join) We can generalize full disjunctions to support constraints that are not merely equality among attributes

Example The date of the historical event is a date in the year when Movies (m-id, title, year, language, location) Actors (a-id, name, date-of-birth) Acted-in (a-id, m-id, role) Actors-that-Directed (a-id, m-id) The date of the historical event is a date in the year when the movie was released The filming location is near the historical site Historical-Events (name, date, description) Historical-Sites (Country, State, City, Site)

The General Idea A set of constraints specifies how tuples should be joined The queries and the database are constructed according to the given constraints A pair of nodes is connected by an edge when it satisfies the corresponding constraint Queries are evaluated w.r.t. the database under weak semantics

Another Way of Generalizing Full Disjunctions: Use OR-Semantics Generate the queries and the database as before, but the queries are evaluated under OR-semantics (rather than weak semantics) This relaxes the requirement that every pair of tuples should be join consistent Instead, a tuple of the full disjunction is only required to be generated by database tuples that form a connected subgraph, but need not be pairwise join consistent

Example Employee: (007, James Bond, London, 6) Employees (e-id, ename, city, dept-no) Departments (dept-no, dname, building) Located-in (building, city, street) Employee: (007, James Bond, London, 6) Department: (6, MI-6, 10) Located-in: (10, Liverpool, King) street city building dname dept -no ename e-id  10 MI-6 6 London James Bond 007 King Liverpool The Full Disjunction

The Full Disjunction under OR-Semantics Example Employees (e-id, ename, city, dept-no) Departments (dept-no, dname, building) Located-in (building, city, street) Employee: (007, James Bond, London, 6) Department: (6, MI-6, 10) Located-in: (10, Liverpool, King) street city building dname dept -no ename e-id King Liverpool 10 MI-6 6 London James Bond 007 The Full Disjunction under OR-Semantics

Two Related Problems The Projection Problem: Computing the projection of the full disjunction on a given set of attributes The Restriction Problem: Computing only those tuples of the full disjunction that are non-null on a given set of attributes The projection problem and the restriction problem cannot be computed in polynomial time (under input-output complexity) unless P=NP

Conclusion Cyclic queries can be computed in polynomial time (in the size of the query, the database and the result) under either OR-semantics or weak semantics A reduction of full-disjunction evaluation to query evaluation under weak semantics is described Using the reduction, full disjunctions can be computed in polynomial time (in the size of the relation schemas, the relations and the result)

Conclusion (continued) Full disjunctions can be generalized in two ways By using OR-semantics instead of weak semantics By joining tuples according to general constraints Generalized full disjunctions can be useful in the context of data integration from heterogeneous sources The projection problem and the restriction problem have polynomial-time algorithms (under input-output complexity) when the relations have γ-acyclic schemas, but not in the general case

Thank You Questions?