Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS848: Topics in Databases: Information Integration Topics covered  Databases  QL  Query containment  An evaluation of QL.

Similar presentations


Presentation on theme: "CS848: Topics in Databases: Information Integration Topics covered  Databases  QL  Query containment  An evaluation of QL."— Presentation transcript:

1 CS848: Topics in Databases: Information Integration Topics covered  Databases  QL  Query containment  An evaluation of QL

2 CS848: Topics in Databases: Information Integration A simple case of information integration Subsystem3 Subsystem2 Subsystem1 SQL Global Schema A single table: T Open, Scan SQL Server  A “local as view” (LAV) integration schema: T ´ Q 2.  User submits Q 1.  Query optimizer must determine if a scan of T suffices.  True iff Q 1 is equivalent to Q 2.

3 CS848: Topics in Databases: Information Integration In the beginning … Infinite countable sets of each of the following kinds of symbols: C = {C 1, C 2, … }(primitive concepts) A = {A 1, A 2, …} [ {B 1, B 2, …}(attributes) R = {R 1, R 2, … }(roles) Conventions: Attributes (resp. primitive concepts and roles) correspond to words in lower case or to positive integers (resp. words in upper case and words in mixed case).

4 CS848: Topics in Databases: Information Integration For a particular database I h , ( ¢ ) I i where  is a countable possibly infinite domain, and where for each symbol (C) I µ  (A) I :  !  (R) I µ (  £  )

5 CS848: Topics in Databases: Information Integration Partial databases e e : {C 1, …, C n } e2e2 e1e1 A e2e2 e1e1 R e 2 e 2  e 2 (C i ) I (A) I (e 1 ) = e 2 (e 1, e 2 ) 2 (R) I e2e2 e1e1 e1  e2e1  e2 e2e2 e1e1 e1  e2e1  e2

6 CS848: Topics in Databases: Information Integration Relational databases “John” “Mary” 33 nameage EMP e 1 : {EMP} e 2 : {EMP} 33 “Mary” “John” name age

7 CS848: Topics in Databases: Information Integration Relational databases (cont’d) {e 1, e 2 } µ (EMP) I (name) I (e 1 ) = “John” (age) I (e 1 ) = (age) I (e 2 ) = 33 {e 1, e 2, “John”, 33, “Mary”} µ  e1  e2e1  e2 “John” ? 33 e 1 : {EMP} e 2 : {EMP} 33 “Mary” “John” name age

8 CS848: Topics in Databases: Information Integration Dialects of QL ( expressiveness ) ( semantics ) Conjunctive QL with bag semantics † Positive QL First order QL Conjunctive QL First order QL with bag semantics Positive QL with bag semantics ‡ † [Khizder et al., 1999], ‡ [Lui et al., 2002]

9 CS848: Topics in Databases: Information Integration Conjunctive QL Q ::=D as A(quantification) |A 1 = A 2.R(unnest) |A 1.Pf 1 = A 2.Pf 2 (selection) |elim A 1, …, A n Q (projection) |true(null tuple) |from Q 1, Q 2 (natural join) |( Q ) D ::=THING | C (basic description) Pf ::=id | A.Pf (path function)

10 CS848: Topics in Databases: Information Integration Well formed queries:  (Q)  (D as A) ´ {A}  (A 1 = A 2.R) ´ {A 1, A 2 }  (A 1.Pf 1 = A 2.Pf 2 ) ´ {A 1, A 2 }  (elim A 1, …, A n Q) ´ {A 1, …, A n }   (true) ´ ;  (from Q 1, Q 2 ) ´  (Q 1 ) [  (Q 2 ) Require {A 1, …, A n } µ  (Q) for projection operators.

11 CS848: Topics in Databases: Information Integration Tuples and bags A (duplicate) tuple t with attribute bindings for attributes {A 1, …, A n } over a database I = h , ( ¢ ) I i has the general form h A 1 : e 1, …, A n : e n, cnt : i i, where {e 1, …, e n } µ , “cnt” is a distinct symbol not used as an attribute, and i a positive integer. A set of duplicate tuples that contain the same attribute bindings is called a bag.

12 CS848: Topics in Databases: Information Integration Operations on tuples  (t) ´ set of attributes occurring in t t@cnt ´ integer i such that “cnt : i” occurs in t t@A ´ element e 2  such that “A : e” occurs in t; defined only when A 2  (t) t[{A 1, …, A n }] ´ {A 1 : t@A 1, …, A n : t@A n }; defined only when {A 1, …, A n } µ  (t) [t] ´ t[  (t)]

13 CS848: Topics in Databases: Information Integration Semantics The meaning of a query Q, denoted « Q ¬, is a function that maps databases to bags. The behavior of this function on a particular database I = h , ( ¢ ) I i is defined as follows. « THING as A ¬ ( I ) ´ { h A : e, cnt : 1 i : e 2  } « C as A ¬ ( I ) ´ { h A : e, cnt : 1 i : e 2 (C) I } « A 1 = A 2.R ¬ ( I ) ´ { h A 1 : e 1, A 2 : e 2, cnt : 1 i : (e 2, e 1 ) 2 R I } « A 1.Pf 1 = A 2.Pf 2 ¬ ( I ) ´ { h A 1 : e 1, A 2 : e 2, cnt : 1 i : (Pf 1 ) I (e 1 ) = (Pf 2 ) I (e 2 )} where (id) I ´ {(e, e) : e 2  } (A.Pf ) I ´ {(e 1, e 2 ) : (Pf ) I ((A) I (e 1 )) = e 2 }

14 CS848: Topics in Databases: Information Integration Semantics (cont’d) « elim A 1, …, A n Q ¬ ( I ) ´ ;, if not well formed; otherwise { h A 1 : t@A 1, …, A n : t@A n, cnt : 1 i : t 2 « Q ¬ ( I )} « true ¬ ( I ) ´ { h cnt : 1 i } « from Q 1, Q 2 ¬ ( I ) ´ {t :  (t) =  (Q 1 ) [  (Q 2 ) Æ 9 t 1 2 « Q 1 ¬ ( I ), t 2 2 « Q 2 ¬ ( I ) : t@cnt = t 1 @cnt £ t 2 @cnt Æ t[  (t 1 )] = [t 1 ] Æ t[  (t 2 )] = [t 2 ]}

15 CS848: Topics in Databases: Information Integration Syntactic sugar A 1. .A n.id ´ A 1. .A n select distinct A 1, …, A n Q ´ elim A 1, …, A n Q select * Q ´ Q Q 1 where Q 2 ´ from Q 1, Q 2 Q 1 and Q 2 ´ from Q 1, Q 2 from ´ true from Q 1, Q 2, …, Q n ´ from (from Q 1, Q 2, …) Q n

16 CS848: Topics in Databases: Information Integration Examples The names of employees who have the same age as another employee with a given name. select distinct :p, name from EMP as e, (select distinct :p, e1 from EMP as e1, EMP as e2 where e1.age = e2.age and e2.name = :p ) where e.name = name and e.id = e1.id

17 CS848: Topics in Databases: Information Integration Method calls (more syntactic sugar) A 1.Pf 1.C(A 2.Pf 2, …, A n-1.Pf n-1 ) = A n.Pf n select distinct A 1, …, A n ´ from C as A where A.1 = A 1.Pf 1 and … and A.n = A n.Pf n A 1.Pf 1.C(A 2.Pf 2, …, A n-1.Pf n-1 ) as A n ´ A 1.Pf 1.C(A 2.Pf 2, …, A n-1.Pf n-1 ) = A n.id

18 CS848: Topics in Databases: Information Integration Examples (cont’d) select distinct name from EMP as e, (select distinct e1 from EMP as e1, EMP as e2 where e2.age.+(e2.age) = e1.age ) where e.name = name and e.id = e1.id The names of employees who have an age double that of another employee.

19 CS848: Topics in Databases: Information Integration Conjunctive datalog (more syntactic sugar) C(A 1, …, A n ) select distinct A 1, …, A n ´ from C as A where A.1 = A 1.id and … and A.n = A n.id (A 1, …, A m ) :- Q 1, …, Q n. ´ select distinct A 1, …, A m from Q 1, …, Q n

20 CS848: Topics in Databases: Information Integration Positive QL Q ::=empty A 1, …, A n (empty set) |Q 1 union Q 2 (union)  (empty A 1, …, A n ) ´ {A 1, …, A n }  (Q 1 union all Q 2 ) ´  (Q 1 )  Require  (Q 1 ) =  (Q 2 ) in union operations.

21 CS848: Topics in Databases: Information Integration Semantics « empty A 1, …, A n ¬ ( I ) ´ ; « Q 1 union Q 2 ¬ ( I ) ´ {t : t@cnt = 1 Æ  (t) =  (Q 1 ) Æ  (t) =  (Q 2 ) Æ ( ( 9 t 1 2 « Q 1 ¬ ( I ) : [t] = [t 1 ] Æ :9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ] ) Ç ( 9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ] Æ :9 t 1 2 « Q 1 ¬ ( I ) : [t] = [t 1 ] ) Ç ( 9 t 1 2 « Q 1 ¬ ( I ), t 2 2 « Q 2 ¬ ( I ) : [t] = [t 1 ] Æ [t] = [t 2 ] ) )}

22 CS848: Topics in Databases: Information Integration First order QL Q ::=Q 1 minus Q 2 (difference)  (Q 1 minus Q 2 ) ´  (Q 1 ) Require  (Q 1 ) =  (Q 2 ) in difference operations.

23 CS848: Topics in Databases: Information Integration Semantics « Q 1 minus Q 2 ¬ ( I ) ´ {t : t@cnt = 1 Æ  (t) =  (Q 1 ) Æ  (t) =  (Q 2 ) Æ ( 9 t 1 2 « Q 1 ¬ ( I ) : [t] = [t 1 ] ) Æ ( :9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ] )}

24 CS848: Topics in Databases: Information Integration QL with duplicates Q ::= select A 1, …, A n Q(duplicate preserving projection) |Q 1 union all Q 2 (bag union) |Q 1 minus all Q 2 (bag difference)

25 CS848: Topics in Databases: Information Integration Well formed queries (cont’d)  (select A 1, …, A n Q) ´ {A 1, …, A n }   (Q 1 union all Q 2 ) ´  (Q 1 )  (Q 1 minus all Q 2 ) ´  (Q 1 ) Require  (Q 1 ) =  (Q 2 ) in bag union and bag difference operations, and that {A 1, …, A n } µ  (Q) in (duplicate preserving) projection operations.

26 CS848: Topics in Databases: Information Integration Semantics « select A 1, …, A n Q ¬ ( I ) ´ ;, if not well formed and representable † ; otherwise { h A 1 : t 1 @A 1, …, A n : t 1 @A n, cnt : n i : t 1 2 « Q ¬ ( I ) Æ n =  (t 2 @cnt) } t 2 2 « Q ¬ ( I ) : t 2 [{A 1, …, A n }] = t 1 [{A 1, …, A n }] † The selection operation is representable on database I iff, for every t 1 2 « Q ¬ ( I ), |{t 2 2 « Q ¬ ( I ) : t 2 [{A 1, …, A n }] = t 1 [{A 1, …, A n }]}| is finite.

27 CS848: Topics in Databases: Information Integration Example A duplicate preserving projection operation that is not representable in any database with an infinite domain. select e1 from THING as e1, THING as e2 Observation: All well-formed duplicate preserving projection operations on databases with finite domains are representable.

28 CS848: Topics in Databases: Information Integration Semantics (cont’d) « Q 1 union all Q 2 ¬ ( I ) ´ ;, if not well formed; otherwise {t 2 « Q 1 ¬ ( I ) : :9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ]} [ {t 2 « Q 2 ¬ ( I ) : :9 t 1 2 « Q 1 ¬ ( I ) : [t] = [t 1 ]} [ {t : 9 t 1 2 « Q 1 ¬ ( I ), t 2 2 « Q 2 ¬ ( I ) : [t] = [t 1 ]} Æ [t] = [t 2 ] Æ t@cnt = t 1 @cnt + t 2 @cnt} « Q 1 minus all Q 2 ¬ ( I ) ´ ;, if not well formed; otherwise {t 2 « Q 1 ¬ ( I ) : :9 t 2 2 « Q 2 ¬ ( I ) : [t] = [t 2 ]} [ {t : 9 t 1 2 « Q 1 ¬ ( I ), t 2 2 « Q 2 ¬ ( I ) : [t] = [t 1 ]} Æ [t] = [t 2 ] Æ t@cnt = t 1 @cnt  t 2 @cnt Æ t 1 @cnt  t 2 @cnt }

29 CS848: Topics in Databases: Information Integration Summary at, =, elim, true, from, select at, =, elim, true, from, empty, union at, =, elim, true, from, empty, union, minus at, =, elim, true, from at, =, elim, true, from, select, empty, union all, minus all at, =, elim, true, from, select, empty, union all (conjunctive) (bag semantics) (set semantics) (positive)(first order)

30 CS848: Topics in Databases: Information Integration Query contexts An expression Q[] in the language QL enriched by an additional terminal symbol [] is called a query context. For a query Q 1 2 QL, the expression Q 1 [Q 2 ] denotes the syntactical substitution of Q 2 for []. Q 2 is compatible with Q 1 if Q 1 [Q 2 ] 2 QL. For example, Q 2 is compatible with Q 1 in the following. Q 2 : EMP as e where e.name = :p Q 1 : select distinct :p, d from DEPT as d, [] where d = e.dept

31 CS848: Topics in Databases: Information Integration The query equivalence problem Q 1 is equivalent to Q 2 for database I, written I ² (Q 1 ´ Q 2 ), if « Q 1 ¬ ( I ) = « Q 2 ¬ ( I ). A query equivalence dependency E has the form (Q 1 ´ Q 2 ). E = (Q 1 ´ Q 2 ) is an axiom if, for any database I, I ² (Q 1 ´ Q 2 ). A query equivalence problem for a given set of query equivalence dependencies is to determine if a given member of the set is an axiom.

32 CS848: Topics in Databases: Information Integration Some axioms Question: Is it true that any E with the following form is an axiom? (elim A 1, …, A m Q 1 )[elim B 1, …, B n Q 2 ] ´ elim A 1, …, A m Q 1 [Q 2 ] Answer: No. However, any such E is an axiom if any attribute in  (Q 2 ) – {B 1, …, B n }) does not occur in query context (elim A 1, …, A m Q 1 []).

33 CS848: Topics in Databases: Information Integration Excluding variable reuse in QL Q has an occurrence of variable reuse if there is a query context Q 1 [] and a query of the form elim A 1, …, A n Q 2 or of the form select A 1, …, A n Q 2 such that Q = Q 1 [Q 2 ] and there exists A in (  (Q 2 ) – {A 1, …, A n }) that also occurs in Q 1 []. Observation: For any Q 1, there exists an equivalent class of query Q 2 that has no occurrence of variable reuse.

34 CS848: Topics in Databases: Information Integration The query containment problem Q 1 is contained in Q 2 for database I, written I ² (Q 1 v Q 2 ), if, for any tuple t 1 in « Q 1 ¬ ( I ), there exists t 2 in « Q 2 ¬ ( I ) such that [t 1 ] = [t 2 ] and t 1 @cnt  t 2 @cnt. A query containment dependency C has the form (Q 1 v Q 2 ). C = (Q 1 v Q 2 ) is an axiom if, for any database I, I ² (Q 1 v Q 2 ). A query containment problem for a given set of query containment dependencies is to determine if a given member of the set is an axiom.

35 CS848: Topics in Databases: Information Integration Equivalence and containment Observation: Equivalence reduces to containment. Q 1 ´ Q 2 iff Q 1 v Q 2 and Q 2 v Q 1 Observation: Containment reduces to equivalence in first order QL. Q 1 v Q 2 iff (Q 1 minus all Q 2 ) ´ empty  (Q 1 )

36 CS848: Topics in Databases: Information Integration Some complexity results Theorem: The query equivalence and containment problems for conjunctive QL is NP-complete. † † Chandra, A. K. and P. M. Merlin. Optimal implementation of conjunctive queries in relational databases. Proc. Ninth Annual ACM Symposium on the Theory of Computing, pp. 77–90, 1977.

37 CS848: Topics in Databases: Information Integration A decision procedure Theorem: The following procedure decides if C = (Q 1 v Q 2 ) is an axiom for conjunctive QL. † 1.Freeze the body of Q 1 by creating a partial database consisting of individuals that include its variables. 2.If the tuple h A 1 : A 1, …, A n : A n, cnt : 1 i occurs in «Q 2 ¬ ( I ), where  (Q 1 ) = {A 1, …, A n }, then return true; otherwise return false. ‡ † Derived from [Ullman, 1999]. ‡ Use forced semantics for selection operations.

38 CS848: Topics in Databases: Information Integration Obtaining a partial database from Q A 1.A 2. .A m = B 1.B 2. .B n A1A1 A2A2 AmAm … B1B1 B2B2 BnBn … THING as A A C as A A : {C} A 1 = A 2.R A2A2 R A1A1

39 CS848: Topics in Databases: Information Integration Derived partial databases (cont’d) w : L u : L 1 A v : L 2 A w : L u : L 1 A v : L 2 A u : L 1 v : L 3 A x : L 4 A w : L 2 u : L 1 v : L 3 A x : L 4 A w : L 2

40 CS848: Topics in Databases: Information Integration Deriving partial databases (cont’d) n 1 : L 1 n 2 : L 2 n 1 : L 1 [ L 2 n 2 : L 1 [ L 2 n 1 : L 1 n 2 : L 2 n 3 : L 3 n 1 : L 1 n 2 : L 2 n 3 : L 3

41 CS848: Topics in Databases: Information Integration Evaluating selections on partial databases Note that selection conditions can navigate missing attribute values. In such cases, assume a forced semantics. In particular, two nodes n 1 and n 2 satisfy a selection condition iff the condition has the form n 1.Pf 1.Pf = n 2.Pf 2.Pf where (Pf 1 ) I (n 1 ) and (Pf 2 ) I (n 2 ) are defined and lead to nodes connected by an equality arc.

42 CS848: Topics in Databases: Information Integration Some complexity results (cont’d) Theorem: The query equivalence problem for conjunctive QL with bag semantics is NP-complete. Observation: The complexity of the query containment problem for conjunctive QL with bag semantics remains open at this time. Example: † In conjunctive QL with bag semantics, the query containment dependency Q 1 v Q 2 is an axiom, where Q 1 and Q 2 have the respective definitions select x, z select x, z from P as x, R as z from P as x, R as z where x = u.Q and z = v.Q where y = u.Q and y = v.Q † [Chaudhuri and Vardi, 1993]

43 CS848: Topics in Databases: Information Integration The query membership problem A database schema, denoted T, consists of a finite set { C 1, …, C n } of query containment dependencies. C is an axiom relative to database schema T = { C 1, …, C n }, written T ² C, if, for any database I, I ² C if I ² C i for each i. A query membership problem for a given set of query containment dependencies is to determine if a given member of the set is an axiom relative to a given database schema also consisting of members of the set.

44 CS848: Topics in Databases: Information Integration More complexity results Theorem: The query membership problem for conjunctive QL is undecidable. Theorem: The query membership problem for first order QL is equivalent to the query containment problem for first order QL. Proof: Assignment.

45 CS848: Topics in Databases: Information Integration Evaluating QL  Defining database schema  Expressing access plans  Fine grained APIs: record addresses  Protocols  Safety  Binding patterns  Adequacy: SQL, OQL, XQuery

46 CS848: Topics in Databases: Information Integration Modeling generalization taxonomies Consider a simple object-oriented schema language consisting of sentences of the following form. † class C {A 1 : ref C 1, …, A m : ref C m } [isa C 1, …, C n ]; Assignment: Encode a fixed collection of such sentences as a database schema in conjunctive QL. Your encoding should be as compact as possible and should enable the following questions to be expressed as query containment dependencies over your schema. 1.Is C a defined class? 2.Is attribute A defined on class C? 3.Can an object reside in both class C 1 and class C 2 ? † Assume that any object in a database was created with respect to a single class.

47 CS848: Topics in Databases: Information Integration Modeling pipelined query access plans (syntax)(defn of  ( ¢ )) (parameter)Q ::= (PARAM as A){A} (index scan) |(from C as A, A.1 = B 1, …, A.n = B n ){A} (nested loops) |(from Q 1, Q 2 )  (Q 1 ) [  (Q 2 ) (noop) |(select A 1, …, A n Q)  (Q) Å {A 1, …, A n } (record field access) |(A 1 = A 2.B){A 1 } (comparison) |(A 1 = A 2 ) ; (catenation) |(Q 1 union all Q 2 )  (Q 1 ) Å  (Q 2 ) (cut) |(elim A 1, …, A n Q) ; |… Require: 1. (  (Q 2 ) –  (Q 2 )) µ  (Q 1 ) for nested loops, and 2.  (Q) =  (Q) for top-level queries.

48 CS848: Topics in Databases: Information Integration Alternative semantics Require richer models theories for 1.sort operations, and 2.named cuts.


Download ppt "CS848: Topics in Databases: Information Integration Topics covered  Databases  QL  Query containment  An evaluation of QL."

Similar presentations


Ads by Google