Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing Full Disjunctions

Similar presentations


Presentation on theme: "Computing Full Disjunctions"— Presentation transcript:

1 Computing Full Disjunctions
Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of Jerusalem

2 Overview of the Talk OR-semantics and weak semantics for querying incomplete data Complexity of query evaluation Full disjunctions as a special case of weak semantics Generalizing full disjunctions – the join constraints are not restricted to be equality constraints Lower bounds for some related problems

3 Querying Incomplete Data Requires a Special Semantics
Usually, answers to a query are complete assignments of database objects (or values) to the query variables Consequently, partial information is lost For example, dangling tuples are lost when joining several relations The purpose of outerjoins and full disjunctions is to solve this problem, i.e., answers could be partial assignments (to some of the variables)

4 Querying Incomplete Semistructured Data
In semistructured data, incompleteness of data is prevalent OR-semantics and weak semantics were introduced so that queries over semistructured data would return maximal answers rather than complete answers [Kanza, Nutt & Sagiv 1999]

5 In the Semistructured Data Model
Both data and queries are labeled rooted directed graphs Query nodes are variables Database nodes are objects Matchings are assignments of database objects to query variables, such that The database root is assigned to the query root, and Labels are preserved

6 A Semistructured Database About Movies
1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director acted in acted in A Semistructured Database About Movies

7 Under complete semantics, the query
A Query v1 movie actor title director v3 name v2 w3 w1 language date of birth w4 w2 acted in Under complete semantics, the query returns actor-movie pairs, such that the actor played in the movie and was also the director of the movie

8 A complete matching of the query variables to database objects movie
1 1 movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen language 11 11 9 1/12/1935 7 6 6 1998 1983 English director acted in v1 movie actor acted in title director v3 name v2 w3 A complete matching of the query variables to database objects w1 language date of birth w4 w2 acted in

9 Constraints on Complete Matchings
The root constraint is satisfied if the query root is mapped to the database root A query edge is an edge constraint: A query edge with a label l is satisfied if it is mapped to a database edge with the same label l Query Root Database Root r 1 x y 9 11 l

10 Suppose that Node 6 is missing movie actor movie title title name
1 movie actor movie 2 3 4 title title name language 6 English 6 English 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 1998 1983 director acted in Suppose that Node 6 is missing acted in

11 An incomplete matching This matching is maximal null movie actor movie
1 1 movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen 11 11 9 1/12/1935 7 1998 1983 director acted in v1 movie actor acted in An incomplete matching title director v3 name v2 w3 w1 language date of birth This matching is maximal w4 w2 null w2 acted in

12 The Reachability Constraint on Partial Matchings
A query node v that is mapped to a database object o satisfies the reachability constraint if there is a path from the query root to v, such that all edge constraints along this path are satisfied Database 1 w y l1 r v l3 l5 1 55 5 8 x z r l2 l4 l6 7 9 1 x z w y l1 r v l3 l2 l5 l4 l6 Query 55

13 Weak Satisfaction of Edge Constraints
An edge constraint is weakly satisfied if it is either Satisfied (as defined earlier), or One (or more) of its nodes is mapped to a null value x y 9 11 l m null x y 9 11 l

14 Weak Matchings A partial matching is a weak matching if
The root constraint is satisfied The reachability constraint is satisfied by every query node that is mapped to a database node Every edge constraint is weakly satisfied

15 A weak matching null movie actor movie title title name year
1 1 movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen 11 11 9 1/12/1935 7 1998 1983 director acted in v1 movie actor acted in title director v3 name v2 A weak matching w3 w1 language date of birth w4 w2 w2 acted in null

16 A Movie Database Consider the case where the director edge is missing
1 movie actor movie 2 3 4 title title name director director 5 8 year date of birth 10 Zelig Antz year Woody Allen 11 9 1/12/1935 7 1998 1983 acted in acted in A Movie Database Consider the case where the director edge is missing

17 An incomplete matching that is not a weak matching
1 1 movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen 11 11 9 1/12/1935 7 1998 1983 acted in v1 movie actor acted in An incomplete matching that is not a weak matching title There is an edge that is not weakly satisfied director v3 name v2 w3 w1 language date of birth w4 w2 w2 acted in null

18 OR Matchings A partial matching is an OR matching if
The root constraint is satisfied The reachability constraint is satisfied by every query node that is mapped to a database node Differently from a weak matching, in an OR Matching, an edge constraint does not have to be weakly satisfied

19 Maximal Matchings Matchings can be represented as tuples (where numbers are object id’s) A matching t1 subsumes a matching t2 if t1 can be obtained from t2 by replacing some nulls in t2 with non-null values A matching is maximal if no other matching subsumes it A query result consists only of maximal matchings t1=(1, 5, 2, null) t2=(1, null, 2, null)

20 More Examples

21 The Movie Database Before the Removals
1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director acted in acted in The Movie Database Before the Removals

22 the actor must be both an actor in the movie and
1 1 In the result, the actor must be both an actor in the movie and the director of the movie movie actor movie 2 2 4 3 4 title title name 5 5 8 year date of birth 10 10 Zelig Antz year Woody Allen language 11 11 9 1/12/1935 7 6 6 1998 1983 English director acted in v1 movie actor acted in title director v3 name v2 w3 w1 language date of birth A complete matching It is also a maximal OR-matching It is also a maximal weak matching w4 w2 acted in

23 In the result, if the actor and the
1 1 In the result, if the actor and the movie are assigned non-null values, then the actor must be both an actor in the movie and the director of the movie movie actor movie 2 3 3 4 title title name 5 8 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director acted in v1 movie actor acted in null title director v3 name v2 w3 w1 null language date of birth A second maximal weak matching w4 w2 acted in null null

24 the actor either played in the movie,
1 1 In the result, the actor either played in the movie, directed the movie, or is not related at all to the movie movie actor movie 2 3 4 3 4 title title name 5 8 8 year date of birth 10 10 Zelig Antz year Woody Allen language 11 11 9 1/12/1935 7 6 1998 1983 English director acted in v1 movie actor acted in title It is not a weak matching director v3 name v2 w3 w1 language date of birth A maximal OR-matching w4 w2 acted in null

25 Complexity of Evaluating Maximal Weak Matchings and Maximal OR Matchings

26 Data Complexity Under data complexity, the time complexity is a function of the size of the database

27 Two Alternatives for Query Evaluation
A naïve algorithm computes all matchings and then removes subsumed matchings A better algorithm avoids computing all matchings – ideally it only computes maximal matchings Under data complexity, both algorithms are polynomial time

28 Input-Output Complexity
Under input-output complexity, the time complexity is a function of the size of the query, the size of the database, and the size of the result

29 A Naïve Algorithm vs. A Better Algorithm
Under I-O complexity, a naïve algorithm is exponential Is there a better algorithm with a polynomial time I-O complexity? The answer is positive for DAG queries [Kanza, Nutt & Sagiv 1999]

30 Cyclic Queries Theorem: For a query Q and a database D,
the set of all maximal weak matchings can be computed in O(q3dm2) time, where q is the size of the query, d is the size of the database and m is the size of the result (computing all maximal OR matchings has the same complexity)

31 What is the full disjunction of a set of relations?
Full Disjunctions What is the full disjunction of a set of relations? How are full disjunctions related to queries with incomplete answers?

32 Actors-that-Directed The Full Disjunction of the Given Relations
Movies Actors language year title m-id English 1983 Zelig 1 1998 Antz 2 Armageddon 3 1940 Fantasia 4 date-of-birth name a-id 1/12/1935 Woody Allen 1 19/3/1955 Bruce Willis 2 28/10/1967 Julia Roberts 3 Acted-in role m-id a-id Zelig 1 Z 2 Harry 3 Actors-that-Directed m-id a-id 1 The Full Disjunction of the Given Relations role Date-of-birth name a-id language year title m-id Zelig 1/12/1935 Woody Allen 1 English 1983 Z 1998 Antz 2 Harry 19/3/1955 Bruce Willis Armageddon 3 1940 Fantasia 4 28/10/1967 Julia Roberts

33 The Full Disjunction of the Given Relations
Movies language year title m-id English 1983 Zelig 1 1998 Antz 2 Armageddon 3 1940 Fantasia 4 This tuple will not be in the full disjunction role Date-of-birth name a-id language year title m-id English 1983 Zelig 1 The Full Disjunction of the Given Relations role Date-of-birth name a-id language year title m-id Zelig 1/12/1935 Woody Allen 1 English 1983 Z 1998 Antz 2 Harry 19/3/1955 Bruce Willis Armageddon 3 1940 Fantasia 4 28/10/1967 Julia Roberts The full disjunction does not include subsumed tuples

34 Actors-that-Directed The Full Disjunction of the Given Relations
Movies Actors language year title m-id English 1983 Zelig 1 1998 Antz 2 Armageddon 3 1940 Fantasia 4 date-of-birth name a-id 1/12/1935 Woody Allen 1 19/3/1955 Bruce Willis 2 28/10/1967 Julia Roberts 3 Acted-in role m-id a-id Zelig 1 Z 2 Harry 3 Actors-that-Directed m-id a-id 1 The Full Disjunction of the Given Relations role Date-of-birth name a-id language year title m-id Zelig 1/12/1935 Woody Allen 1 English 1983 Z 1998 Antz 2 Harry 19/3/1955 Bruce Willis Armageddon 3 1940 Fantasia 4 28/10/1967 Julia Roberts role Date-of-birth name a-id language year title m-id 28/10/1967 Julia Roberts 3 English 1940 Fantasia 4 The full disjunction does not include tuples that are based on Cartesian Product rather than join

35 In the Full Disjunction of a Given Set of Relations:
Every tuple of the input is a part of at least one tuple of the output Tuples are joined as in a natural join, padded with null values The result includes only “maximal connected portions”

36 Motivation for Full Disjunctions
Full disjunctions have been proposed by Galiando-Legaria as an alternative for outerjoins [SIGMOD’94] Rajaraman and Ullman suggested to use full disjunctions for information integration [PODS’96]

37 Computing Full Disjunctions for γ-acyclic Relation Schemas
Rajaraman and Ullman have shown how to evaluate the full disjunction by a sequence of natural outerjoins when the relation schemas are γ-acyclic Hence, the full disjunction can be computed in polynomial time, under input-output complexity, when the relation schemas are γ-acyclic

38 Weak Semantics Generalizes Full Disjunctions
Relations can be converted into a semistructured database The full disjunction can be expressed as the union of several queries that are evaluated under weak semantics

39 We use colors instead of labels
Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 A node is created for each tuple Edges are added between connected tuples, in both directions A root is added, and edges are added from the root to every node We use colors instead of labels Creating The Database

40 Example r Actors Acted-in Movies
name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 A node is created for each relation schema Edges are added between connected schemas, in both directions The number of queries is equal to the number of schemas In each query, the root is connected to a different schema r Movies Actors Acted-in Creating The Queries

41 Queries are Evaluated under
Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id role name a-id title m-id Zelig Woody Allen 1 Acted-in Actors Movies Queries are Evaluated under Weak Semantics

42 Queries are Evaluated under
Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 role name a-id title m-id Zelig Woody Allen 1 Acted-in Actors Movies Queries are Evaluated under Weak Semantics

43 Queries are Evaluated under
Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 role name a-id title m-id Zelig Woody Allen 1 Acted-in Actors Movies Queries are Evaluated under Weak Semantics

44 Queries are Evaluated under
Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 Julia Roberts null Acted-in null Actors Movies Queries are Evaluated under Weak Semantics

45 Queries are Evaluated under
Example r Actors Acted-in Movies name a-id Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 r role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 Julia Roberts role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 Acted-in Actors Movies Queries are Evaluated under Weak Semantics

46 Example r Actors Acted-in Movies r Acted-in Actors Movies name a-id
Woody Allen 1 Bruce Willis 2 Julia Roberts 3 role m-id a-id Zelig 1 Z 2 Harry 3 title m-id Zelig 1 Antz 2 Armageddon 3 Fantasia 4 role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 Julia Roberts Fantasia 4 role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 Julia Roberts role name a-id title m-id Zelig Woody Allen 1 Z Antz 2 Harry Bruce Willis Armageddon 3 r null Acted-in null Actors Movies

47 The Algorithm Computes Full Disjunctions in Polynomial Time Under Input-Output Complexity
Theorem: The full disjunction of relations r1, …, rn can be computed in O(n5s 2f 2) time, where n is the number of relations, s is the total size of all the relations and f is the size of the result

48 Generalizing Full Disjunctions
In a full disjunction, tuples are joined according to equality constraints as in a natural join (or equi-join) We can generalize full disjunctions to support constraints that are not merely equality among attributes

49 Example The date of the historical event is a date in the year when
Movies (m-id, title, year, language, location) Actors (a-id, name, date-of-birth) Acted-in (a-id, m-id, role) Actors-that-Directed (a-id, m-id) The date of the historical event is a date in the year when the movie was released The filming location is near the historical site Historical-Events (name, date, description) Historical-Sites (Country, State, City, Site)

50 The General Idea A set of constraints specifies how tuples should be joined The queries and the database are constructed according to the given constraints A pair of nodes is connected by an edge when it satisfies the corresponding constraint Queries are evaluated w.r.t. the database under weak semantics

51 Another Way of Generalizing Full Disjunctions: Use OR-Semantics
Generate the queries and the database as before, but the queries are evaluated under OR-semantics (rather than weak semantics) This relaxes the requirement that every pair of tuples should be join consistent Instead, a tuple of the full disjunction is only required to be generated by database tuples that form a connected subgraph, but need not be pairwise join consistent

52 Example Employee: (007, James Bond, London, 6)
Employees (e-id, ename, city, dept-no) Departments (dept-no, dname, building) Located-in (building, city, street) Employee: (007, James Bond, London, 6) Department: (6, MI-6, 10) Located-in: (10, Liverpool, King) street city building dname dept -no ename e-id 10 MI-6 6 London James Bond 007 King Liverpool The Full Disjunction

53 The Full Disjunction under OR-Semantics
Example Employees (e-id, ename, city, dept-no) Departments (dept-no, dname, building) Located-in (building, city, street) Employee: (007, James Bond, London, 6) Department: (6, MI-6, 10) Located-in: (10, Liverpool, King) street city building dname dept -no ename e-id King Liverpool 10 MI-6 6 London James Bond 007 The Full Disjunction under OR-Semantics

54 Two Related Problems The Projection Problem: Computing the projection of the full disjunction on a given set of attributes The Restriction Problem: Computing only those tuples of the full disjunction that are non-null on a given set of attributes The projection problem and the restriction problem cannot be computed in polynomial time (under input-output complexity) unless P=NP

55 Conclusion Cyclic queries can be computed in polynomial time (in the size of the query, the database and the result) under either OR-semantics or weak semantics A reduction of full-disjunction evaluation to query evaluation under weak semantics is described Using the reduction, full disjunctions can be computed in polynomial time (in the size of the relation schemas, the relations and the result)

56 Conclusion (continued)
Full disjunctions can be generalized in two ways By using OR-semantics instead of weak semantics By joining tuples according to general constraints Generalized full disjunctions can be useful in the context of data integration from heterogeneous sources The projection problem and the restriction problem have polynomial-time algorithms (under input-output complexity) when the relations have γ-acyclic schemas, but not in the general case

57 Thank You Questions?


Download ppt "Computing Full Disjunctions"

Similar presentations


Ads by Google