# CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of.

## Presentation on theme: "CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of."— Presentation transcript:

CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of dialects  Terminologies  Language extensions

CS848: Topics in Databases: Foundations of Query Optimization Single column QL D ::=THING |C Q ::=D as x |(empty x) |(THING as x minus C as x) |(from Q 1, Q 2 ) |(elim x from x.A = y, elim y from y = x, Q) |(x.Pf 1 = x.Pf 2 ) |(THING as x minus x.Pf 1 = x.Pf 2 ) |(elim x x.R = y) |(THING as x minus elim x from x.R = y, elim y from y = x, THING as x minus Q) | 

CS848: Topics in Databases: Foundations of Query Optimization Initial analysis The language L 2 consists of all formulae of FOPC with equality and constant functions that use at most two distinct variables. Theorem: The satisfiability problem for L 2 is NEXPTIME-complete. Corollary: The query containment problem for single column QL is decidable for queries that are attribute free.

CS848: Topics in Databases: Foundations of Query Optimization New syntax (cont’d) D ::=THING |C Q ::=D as x | ?, |(empty x) |(THING as x minus C as x) |(from Q 1, Q 2 ) |(elim x from x.A = y, elim y from y = x, Q) |(x.Pf 1 = x.Pf 2 ) |(THING as x minus x.Pf 1 = x.Pf 2 ) |(elim x x.R = y) |(THING as x minus elim x from x.R = y, elim y from y = x, THING as x minus Q) | 

CS848: Topics in Databases: Foundations of Query Optimization New syntax (cont’d) D ::=THING |C Q ::=D as x | ? | : C, |(THING as x minus C as x) |(from Q 1, Q 2 ) |(elim x from x.A = y, elim y from y = x, Q) |(x.Pf 1 = x.Pf 2 ) |(THING as x minus x.Pf 1 = x.Pf 2 ) |(elim x x.R = y) |(THING as x minus elim x from x.R = y, elim y from y = x, THING as x minus Q) | 

CS848: Topics in Databases: Foundations of Query Optimization New syntax (cont’d) D ::=THING |C Q ::=D as x | ? | : C |C 1 u C 2, |(from Q 1, Q 2 ) |(elim x from x.A = y, elim y from y = x, Q) |(x.Pf 1 = x.Pf 2 ) |(THING as x minus x.Pf 1 = x.Pf 2 ) |(elim x x.R = y) |(THING as x minus elim x from x.R = y, elim y from y = x, THING as x minus Q) | 

CS848: Topics in Databases: Foundations of Query Optimization New syntax (cont’d) D ::=THING |C Q ::=D as x | ? | : C |C 1 u C 2 | 8 A.D, |(elim x from x.A = y, elim y from y = x, Q) |(x.Pf 1 = x.Pf 2 ) |(THING as x minus x.Pf 1 = x.Pf 2 ) |(elim x x.R = y) |(THING as x minus elim x from x.R = y, elim y from y = x, THING as x minus Q) | 

CS848: Topics in Databases: Foundations of Query Optimization New syntax (cont’d) D ::=THING |C Q ::=D as x | ? | : C |C 1 u C 2 | 8 A.D |Pf 1 = Pf 2, |(x.Pf 1 = x.Pf 2 ) |(THING as x minus x.Pf 1 = x.Pf 2 ) |(elim x x.R = y) |(THING as x minus elim x from x.R = y, elim y from y = x, THING as x minus Q) | 

CS848: Topics in Databases: Foundations of Query Optimization New syntax (cont’d) D ::=THING |C Q ::=D as x | ? | : C |C 1 u C 2 | 8 A.D |Pf 1 = Pf 2 |Pf 1  Pf 2, |(THING as x minus x.Pf 1 = x.Pf 2 ) |(elim x x.R = y) |(THING as x minus elim x from x.R = y, elim y from y = x, THING as x minus Q) | 

CS848: Topics in Databases: Foundations of Query Optimization New syntax (cont’d) D ::=THING |C Q ::=D as x | ? | : C |C 1 u C 2 | 8 A.D |Pf 1 = Pf 2 |Pf 1  Pf 2 | 9 R.THING, |(elim x x.R = y) |(THING as x minus elim x from x.R = y, elim y from y = x, THING as x minus Q) | 

CS848: Topics in Databases: Foundations of Query Optimization New syntax (cont’d) D ::=THING |C Q ::=D as x | ? | : C |C 1 u C 2 | 8 A.D |Pf 1 = Pf 2 |Pf 1  Pf 2 | 9 R.THING | 8 R.D, |(THING as x minus elim x from x.R = y, elim y from y = x, THING as x minus Q) | 

CS848: Topics in Databases: Foundations of Query Optimization New syntax (cont’d) Q ::=D as x |  D ::=THING |C | ? | : C |C 1 u C 2 | 8 A.D |Pf 1 = Pf 2 |Pf 1  Pf 2 | 9 R.THING | 8 R.D |(D)

CS848: Topics in Databases: Foundations of Query Optimization New syntax (cont’d) Q ::=D as x |  D ::= > |C | ? | : C |C 1 u C 2 | 8 A.D |Pf 1 = Pf 2 |Pf 1  Pf 2 | 9 R. > | 8 R.D |(D)

CS848: Topics in Databases: Foundations of Query Optimization Concept dependencies On terminology and notation: We call an instance of the language generated by D for a given DL a concept. A concept inclusion dependency C for a given DL is written D 1 v D 2 and corresponds to the query containment dependency (D 1 as x) v (D 2 as x). A concept definition C for a given DL is written C ´ D and corresponds to the query equivalence dependency (C as x) ´ (D as x).

CS848: Topics in Databases: Foundations of Query Optimization CLASSIC † (our first DL) (syntax) (semantics) D ::= (universal concept) | >  (primitive concept) |C (C) I (bottom concept) | ? ; (atomic negation) | : C  – (C) I (intersection) |D 1 u D 2 (D 1 ) I Å (D 2 ) I (attribute value restriction) | 8 A.D {e : (A) I (e) 2 (D) I } (path agreement) |Pf 1 = Pf 2 {e : (Pf 1 ) I (e) = (Pf 2 ) I (e)} (path disagreement) |Pf 1  Pf 2 {e : (Pf 1 ) I (e)  (Pf 2 ) I (e)} (existential quantification) | 9 R.D {e 1 : 9 e 2 : (e 1, e 2 ) 2 (R) I Æ e 2 2 (D) I } (role value restriction) | 8 R.D {e 1 : 8 (e 1, e 2 ) 2 (R) I : e 2 2 (D) I } |(D) † [Borgida and Patel-Schneider, 1994]

CS848: Topics in Databases: Foundations of Query Optimization Concept dependencies (cont’d) The concept inclusion problem for a given DL is to determine if a concept inclusion dependency in the DL, D 1 v D 2, is an axiom; that is, to determine if (D 1 ) I µ (D 2 ) I for any database I. Theorem: The concept inclusion problem for CLASSIC is solvable in low order polynomial time.

CS848: Topics in Databases: Foundations of Query Optimization An efficient decision procedure Theorem: The following procedure decides if C = (D 1 v D 2 ) is an axiom for CLASSIC, and can be implemented in low order polynomial time. 1.Create a partial database I 1 consisting of a single individual e in concept D 1. Perform a simple chase of I 1 to obtain a partial database I 2. 2.Return true if the domain of I 2 is empty, or if the tuple h x : e, cnt : 1 i occurs in « D 2 as x ¬ ( I 2 ) † ; otherwise return false. † Use forced semantics for agreements and disagreements.

CS848: Topics in Databases: Foundations of Query Optimization The simple chase n : {D 1 t D 2 } [ L n : {D 1, D 2 } [ L n 1 : { 8 A.D} [ L n 2 : {D} n 1 : L A n 1 : { 9 R.D} [ L n 2 : {D} n 1 : L R

CS848: Topics in Databases: Foundations of Query Optimization The simple chase (cont’d) n 2 : L 2 n 1 : { 8 R.D} [ L 1 R n 2 : {D} [ L 2 n 1 : L 1 R n : {A 1.A 2. .A r = B 1.B 2. .B s } [ L n : L u 1 : ;  u r : ; A1A1 ArAr A2A2 v 1 : ;  v s : ; BsBs B2B2 B1B1

CS848: Topics in Databases: Foundations of Query Optimization The simple chase (cont’d) n : {A 1.A 2. .A r  B 1.B 2. .B s } [ L n : L u 1 : ;  u r : ; A1A1 ArAr A2A2 v 1 : ;  v s : ; BsBs B2B2 B1B1 w : L u : L 1 A v : L 2 A w : L u : L 1 A v : L 2 A

CS848: Topics in Databases: Foundations of Query Optimization The simple chase (cont’d) n 1 : L 1 n 2 : L 2 n 1 : L 1 [ L 2 n 2 : L 1 [ L 2 n 1 : L 1 n 2 : L 2 n 3 : L 3 n 1 : L 1 n 2 : L 2 n 3 : L 3 u : L 1 v : L 3 A x : L 4 A w : L 2 u : L 1 v : L 3 A x : L 4 A w : L 2

CS848: Topics in Databases: Foundations of Query Optimization The simple chase (cont’d) w : L u : L 1 A v : L 2 A w : { ? }u : L 1 A v : L 2 A u : L 1 v : L 3 A x : L 4 A w : L 2 u : L 1 v : L 3 A x : L 4 A w : L 2

CS848: Topics in Databases: Foundations of Query Optimization The simple chase (cont’d) (remove all nodes and incident arcs) n : { ? } [ L or m : L 1 n : L 2 n : {C, : C } [ L or

CS848: Topics in Databases: Foundations of Query Optimization Evaluating agreements and disagreements Note that agreements and disagreements can navigate missing attribute values. In such cases, assume a forced semantics. In particular, a node n satisfies an agreement iff the agreement has the form Pf 1.Pf = Pf 2.Pf where (Pf 1 ) I (n) and (Pf 2 ) I (n) are defined and lead to nodes connected by an equality arc; n satisfies a disagreement iff it has the form Pf 1 = Pf 2 where (Pf 1 ) I (n) and (Pf 2 ) I (n) are defined and lead to nodes connected by an inequality arc.

CS848: Topics in Databases: Foundations of Query Optimization Example Observation: The chase decision procedure for CLASSIC can be implemented in O(n log n) time, where n is the length of the component descriptions. select e from EMP as e where e = e.b.b.b and e = e.b.b.b.b.b  (from (EMP as x), (from (x = x.b.b.b), (x = x.b.b.b.b.b))) ´ EMP u (id = b.b.b) u (id = b.b.b.b.b) as x  EMP u (id = b.b.b) u (id = b.b.b.b.b) ´ EMP u (id = id.b)  EMP u (id = b) as x)  select e from EMP as e where e = e.b

CS848: Topics in Databases: Foundations of Query Optimization The ALC family of DLs (syntax) (semantics) D ::= (primitive concept) |C (C) I (universal concept) | >  (bottom concept) | ? ; (atomic negation) | : C  – (C) I (intersection) |D 1 u D 2 (D 1 ) I Å (D 2 ) I (role value restriction) | 8 R.D {e 1 : 8 (e 1, e 2 ) 2 (R) I : e 2 2 (D) I } (limited existential quantification) | 9 R. > {e 1 : 9 e 2 : (e 1, e 2 ) 2 (R) I Æ e 2 2 (D) I } (union) |D 1 t D 2 (D 1 ) I [ (D 2 ) I (full existential quantification) | 9 R.D {e 1 : 9 e 2 : (e 1, e 2 ) 2 (R) I Æ e 2 2 (D) I } (quantified number restriction) |( > n R) {e 1 : |{e 2 : (e 1, e 2 ) 2 (R) I }| ¸ n} (quantified number restriction) |( 6 n R) {e 1 : n ¸ |{e 2 : (e 1, e 2 ) 2 (R) I }|} (full negation) | : D  – (D) I

CS848: Topics in Databases: Foundations of Query Optimization The ALC family of DLs (cont’d) FL 0 FL – AL ALN D ::=C p p p p | > p p p | ? p p p | : C p p |D 1 u D 2 p p p p | 8 R.D p p p p | 9 R. > p p p |D 1 t D 2 | 9 R.D |( > n R) p |( 6 n R) p | : D

CS848: Topics in Databases: Foundations of Query Optimization The ALC family of DLs (cont’d) ALU ALE ALUEALC ALCN D ::=C p p p p p | > p p p p p | ? p p p p p | : C p p p p p |D 1 u D 2 p p p p p | 8 R.D p p p p p | 9 R. > p p p p p |D 1 t D 2 p p ± p | 9 R.D p p ± p |( > n R) p |( 6 n R) p | : D ± p p

CS848: Topics in Databases: Foundations of Query Optimization Some complexity results Theorem: The concept inclusion problems for ALC and ALCN are PSPACE-complete. A consistency problem for a given set of concepts is to determine if there exists a database that interprets a given member of the set as nonempty. Observation: The consistency problem for ALC (resp. ALCN ) coincides with the concept inclusion problem for ALC (resp. ALCN ). In particular, D 1 v D 2 is an axiom iff the concept (D 1 u : D 2 ) is not consistent.

CS848: Topics in Databases: Foundations of Query Optimization Testing consistency in ALC Theorem: The following procedure decides if a given concept D in ALC is consistent. 1.Create a singleton set S 1 = { I } of partial databases in which I consists of a single individual e in concept D. Perform a union generalized chase of S 1 to obtain a set of partial databases S 2 = { I 1, …, I n }. 2.Return true if the domain of any database in S 2 is nonempty; otherwise return false.

CS848: Topics in Databases: Foundations of Query Optimization Union generalized chase Repeatedly do the following to a given set of partial databases S until no changes occur. 1.Apply the simple chase augmented with the negation rule to a member of S. 2.If S contains a partial database I that in turn contains a node n with the form on the left below, then replace I with two partial databases I 1 and I 2 in S in which the labeling of node n is revised to the forms on the right below. e : {D 1 t D 2 } [ L e : {D 1 } [ L e : {D 2 } [ L (old node n in I )(new node n in I 2 )(new node n in I 1 )

CS848: Topics in Databases: Foundations of Query Optimization The negation rule Exhaustively apply the following rewrites to the concept labeling for any given node: † :>) ? :?) > :: D ) D : (D 1 u D 2 ) ) ( : D 1 ) t ( : D 2 ) :8 A.D ) 8 A. : D :8 R. D ) 9 R. : D :9 R. D ) 8 R. : D : (D 1 t D 2 ) ) ( : D 1 ) u ( : D 2 ) † Obtains negation normal form for concept descriptions.

CS848: Topics in Databases: Foundations of Query Optimization A general membership problem A database schema T that consists of concept dependencies in which no primitive concept occurs more than once on the left-hand-side of a concept definition is called a terminology. The membership problem for a DL dialect is to determine, given a set { C 1, …, C n, C } of concept dependencies in the DL, if { C 1, …, C n } ² C ; that is, if every database I that models each C i also models C. Theorem: The membership problem for CLASSIC is undecidable. Theorem: The membership problem for ALCN is DEXPTIME-complete.

CS848: Topics in Databases: Foundations of Query Optimization Varieties of terminologies A terminology T with only concept definitions is definitional. For each C 1 ´ D occurring in a terminology T and each primitive concept C 2 occurring in D, C 1 has a direct use of C 2. The use relation is the transitive closure of direct use. T is cyclic iff there exists an atomic concept in T that has a use of itself. T is acyclic iff it is definitional and is not cyclic.

CS848: Topics in Databases: Foundations of Query Optimization An acyclic terminology in ALC WOMAN ´ PERSON u FEMALE MAN ´ PERSON u : WOMAN MOTHER ´ WOMAN u 9 hasChild.PERSON FATHER ´ MAN u 9 hasChild.PERSON PARENT ´ FATHER t MOTHER GRANDMOTHER ´ MOTHER u 9 hasChild.PARENT MOTHERWITHMANYCHILDREN ´ MOTHER u > 3 hasChild MOTHERWITHOUTDAUGHTER ´ MOTHER u 8 hasChild. : WOMAN WIFE ´ WOMAN u 9 hasHusband.MAN

CS848: Topics in Databases: Foundations of Query Optimization More complexity results Theorem: The membership problem for FL 0 with acyclic terminologies is CoNP-complete. Theorem: The membership problem for ALC with acyclic terminologies is PSPACE-complete. The DL ALCF extends ALC with agreements and disagreements of path functions. Theorem: The concept inclusion problem for ALCF is PSPACE-complete. Theorem: The membership problem for ALCF with acyclic terminologies is NEXPTIME-complete.

CS848: Topics in Databases: Foundations of Query Optimization Blocking Theorem: The membership problem for ALCN is DEXPTIME-complete. The membership problem for ALCN can be solved by a refinement of the consistency checking algorithm for concepts in ALC. There are two important tricks to note. 1.Each concept dependency occurring in the terminology, e.g. D 1 v D 2, is internalized to each new node by adding a corresponding concept, e.g. ( : D 1 t D 2 ), to the node’s label. 2.To ensure termination, no chasing is performed on blocked nodes. A node is blocked if its concepts are included in an older node.

CS848: Topics in Databases: Foundations of Query Optimization Language extensions  Role constructors  Role value maps  Uniqueness constraints

Download ppt "CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of."

Similar presentations