Download presentation

Presentation is loading. Please wait.

Published byDeasia Tipton Modified about 1 year ago

1
1 A Brief History of DB Theory Functional dependencies --- Normalization More dependencies: multivalued, general Universal relations Acyclic hypergraphs Logical query languages: Datalog

2
2 Functional Dependencies Setting: A relation = table with column headers (schema ) and a finite number of rows (tuples ). ABC

3
3 FD X ->Y A statement about which instances (finite sets of tuples) are “legal” for a given relation. If two tuples agree in all the attributes of set X, then then must also agree in all attributes of set Y. –Common case: X is the key of the relation, Y is the other attributes.

4
4 Issue #1: Inference Armstrong’s Axioms –X ->Y if Y is a subset of X –X ->Y implies XZ ->YZ (XZ, etc. = “union”) –X ->Y and YZ ->W implies XZ ->W Closure test (Bernstein): given FD’s and a possible consequent X ->Y, use the FD’s to see what X determines, and then see if Y is contained therein.

5
5 Issue #2: Redundancy Certain combinations of FD’s lead to redundancy and other “anomalies.” Example: B ->C is the only FD: ABC ? The FD lets us predict the value of ?.

6
6 Issue #3: Normalization Eliminate redundancy by splitting schemas. A FD X ->Y allows us to split schema XYZ into XY and XZ. Without the FD, the decomposition would lack a lossless join : the ability to reconstruct the original XYZ relation from the decomposed relations.

7
7 How I Met Moshe & Learned Database Theory Tsichritzis BernsteinBeeri UllmanVardi

8
8 Why I Like Working With Vardi Taste, selection of issues. Best inventor of constructions. Name comes after mine in alphabet.

9
9 Lossless Joins When relations are decomposed, do the pieces allow reconstruction of the original? –Only way: join the projected relations. –You always get back what you started with. –Bad case is when you get more.

10
10 Example ABC012314ABABCBC ABC012314ABABCBC

11
11 More Kinds of Dependencies In essence, a dependency is any predicate that tells whether a given set of tuples is OK for a given schema. The language chosen determines what constraints can be expressed and what we can decide about relations. –FD’s are just one, simple example

12
12 Multivalued Dependencies These occur when a relation tries to connect one class of objects to independent sets from two other classes.. Notation: X ->->Y means: –If two tuples agree on X, then we may swap the Y components and get two tuples that are also in the relation.

13
13 Example EmpID ->-> Phone EmpID Addr Phone Project If:123 a1 p1j1 123 a2 p2j2 Then:123 a1 p2j1 123 a2 p1j2

14
14 MVD Origins Independent work of Delobel, Fagin, Zaniolo Inference of MVD’s + FD’s: Beeri, Fagin, and Howard.

15
15 Generalized Dependencies Equality-generating dependencies: –If tuples with this pattern of equal symbols appear in a relation instance, then certain symbols must also be equal. –Generalizes FD’s. Tuple-generating dependencies: –If tuples with this pattern of equal symbols appear in a relation instance, then a certain tuple must also appear. –Generalizes MVD’s.

16
16 Example: EGD FD A ->B in schema ABC : ABC ab1c1(Hypotheses) ab2c2 b1 = b2(Conclusion)

17
17 Example: TGD MVD A ->->B in schema ABC : ABC ab1c1(Hypotheses) ab2c2 ab1c2(Conclusion)

18
18 Full Versus Embedded GD’s Full GD’s have a conclusion with only symbols that appear in the hypotheses. Embedded GD’s may have new symbols in the conclusion. –Existentially quantified. Embedded EGD makes no sense, but embedded TGD’s are quite interesting.

19
19 The Chase for Inferring GD’s Test whether a GD G follows from given GD’s. Start with R = the hypotheses of G. Apply GD H by mapping the hypotheses of H to some rows of R. –If so, infer the (mapped) conclusion of H --- equate two symbols of R or add a tuple to R. If you eventually infer the conclusion of G, then G follows, else it does not.

20
20 Example: If A ->B Then A ->->B R has tuples (a,b1,c1) and (a,b2,c2). Does it have (a,b1,c2)? Apply A ->B. –It’s hypotheses are the same as the two tuples of R, so surely they map. –Lets us conclude b1=b2. –Thus, since R has (a,b2,c2), It surely has (a,b1,c2).

21
21 Decision Properties of GD’s If all dependencies are full, no new symbols ever appear, so the chase must terminate. –Either we prove the desired conclusion, or R becomes a relation that provides a counterexample --- it satisfies all the given GD’s, but not the one we were trying to infer.

22
22 Undecidability of Embedded GD’s First undecidability results involved the inference of untyped GD’s (symbols may appear in several columns). –Beeri and Vardi –Chandra, Lewis, and Makoswky Tough result is undecidability even when GD’s are typed (one column per symbol). –Vardi –Gurevich and Lewis

23
23 History of GD’s Similar ideas were developed independently by several people: –Yannakakis and Papadimitriou –Beeri and Vardi –Fagin –Sadri and Ullman

24
24 The Universal Relation Idea: attributes carry information regardless of how they are placed in relation schemas. One of the cool things about Moshe Vardi is the collection he keeps of re-inventions of this universal relation concept. –Typically touted by some HCI type as a “wonderful new interface to databases.”

25
25 UR Query Systems Query UR Stored Relations

26
26 Example Stored relations: ES(Emp,Sal), ED(Emp,Dept), DM(Dept,Mgr) Universal relation: U(Emp,Sal, Dept,Mgr) UR query: SELECT Mgr WHERE Emp=a Translated to: SELECT Mgr FROM ED, DM WHERE Emp=a AND ED.Dept=DM.Dept U != join(ES, ED, DM); empty ES proves it.

27
27 Theory of Universal Relations Some were quite upset by the UR idea. –Objection: the informal or ad-hoc way queries were translated from UR to stored relations. Codd: “Neither the collection of all base relations nor the collection of all views should be cast by the DBMS in the form of a `universal relation’ (in the Stanford University sense) [Vardi, 1988].”

28
28 Some Early Approaches to the UR Selected joins of stored relations. Representative instance = pad stored relations with nulls in missing attributes; then chase with given dependencies. –Contributions by Vassiliou, Honeyman, Mendelzon, Sagiv, Yannakakis. Maximal objects = union of joins within “acyclic hypergraphs.” –Maier, Ullman

29
29 Window Functions Two stage process: 1. If query involves set of attributes X, compute the window function [X ] = some relation with schema X derived from the UR. 2. Apply the query to [X ]. Different UR definitions give different window functions, e.g., join-based, rep.- instance, maximal-object. From Maier, Ullman, Vardi [1984].

30
30 Acyclic Hypergraphs Several different applications led to the identification of a class of database schemas (collection of relation schemas) with useful properties. –Sensible connections among stored relations in maximal-object theory of UR. –Joins of many relations in time polynomial in the size of the input and output (Yannakakis). –Etc., etc.

31
31 Hypergraphs Nodes (typically attributes) (Hyper)edges = sets of any number of nodes. ABCD E F

32
32 GYO Reduction Graham, Yu-Ozsuyoglu, independently defined acyclic hypergraphs thusly: Two transformations: 1. Delete a node that is in only one hyperedge. 2. Delete a hyperedge contained in another. Hypergraph is acyclic iff the result is a single, empty edge. –Limit is unique for any hypergraph.

33
33 Example GYO steps: delete C; delete D; delete {B}; delete F; delete {B,E}; delete A; delete B; delete E; ABCD E F

34
34 History Acyclic hypergraph idea from Bernstein and Goodman. –But the definition is rather different. Hypergraph version from Fagin, Mendelzon, Beeri, Maier, others. –And don’t forget the GYO folks. Fagin defined several related-but-distinct concepts of “acyclic.” –This one is “alpha-acyclic.”

35
35 Logical Query Languages Collections of Horn-clause rules without function symbols behave almost like SQL, but can do recursion. Conjunctive queries = single Horn-clause rules w/o function symbols have decidable containment (Chandra and Merlin).

36
36 Conjunctive Queries Example: path(X,Y) :- edge(X,Z) & edge(Z,Y) Atoms in the body (hypothesis) refer to stored relations (EDB or “Extensional Database”). Atom in head (conclusion) is the result. Q1 is contained in Q2 if for every database D, Q1(D ) is a subset of Q2(D ).

37
37 Containment Mappings Mapping from the variables of Q2 to the variables of Q1 that: 1. Turns the head of Q2 into the head of Q1. 2. Turns every atom in the body of Q2 into some atom in the body of Q1. Containment mapping exists iff Q1 is contained in Q2.

38
38 Example Q1: answer(X,Y) :- e(X,Y) & e(Y,X) Q2: answer(X,Y) :- e(X,W) & e(W,Z) & e(Z,Y)

39
39 Datalog Programs Collection of conjunctive queries. Some predicates are EDB (stored). Other predicates are IDB (intensional database; defined by the CQ’s only). One IDB predicate is the answer = least fixedpoint of the CQ’s.

40
40 Example path(X,Y) :- edge(X,Y) path(X,Y) :- path(X,Z) & edge(Z,Y) Value of path is all pairs of nodes such that there is a path from the first to the second according to EDB predicate edge.

41
41 Optimization of Datalog Programs Problem: often the query asks for only a fraction of the answer. –e.g., find path(0,Y) for fixed node 0. Bottom-up methods (“seminaive”) evaluate the whole least-fixedpoint, throw most away. Top-down (goal seeking) can get stuck in left-recursion.

42
42 Linear-Recursion Methods Recursion is linear if at most one IDB atom in the body. –Henschen-Naqvi. –Magic-sets for linear (Bancilhon, Maier, Sagiv, Ullman). –Left- and right-linear special cases (Naughton). –Conversion of nonlinear to linear (Ramakrishnan, Sagiv, Ullman, Vardi).

43
43 Magic Sets Several similar techniques for rewriting or executing Datalog programs in a way that avoids generation of useless facts. –Rohmer, Lescoeur, and Kerisit (earliest exposition). –Beeri and Ramakrishnan (reordering of atoms). –Sacca and Zaniolo (simplification of rules). –Dietrich and Warren (tabulation). –Vielle (query-subquery).

44
44 Bounded Recursion Sometimes a recursive Datalog program is equivalent to a nonrecursive program. Sufficient conditions (Naughton, Ioannidis, Naughton and Sagiv). Undecidability (Gaifman, Mairson, Sagiv, and Vardi). Decidable cases (Vardi, Cosmodakis, Gaifman, Kanellakis, and Vardi)

45
45 Containment for Datalog Undecidable if one Datalog program is contained in another (Shmueli). NP-complete whether a CQ is contained in a Datalog program. Triply exponential whether a Datalog program is contained in a CQ (Chaudhuri and Vardi).

46
46 What Is It All Good For? Dependencies and normalization are now familiar to most CS graduates. –Difference between BCNF and 3NF. –Tradeoffs: lossless joins versus maintaining dependencies. Magic-sets, recursion used in IBM’s DB/2. Universal-relation interfaces despite Codd.

47
47 More “What Is It All Good For?” Information integration. While logical query languages have not caught on, logic has been used to specify how legacy databases are combined into a uniform whole. –Tsimmis (Papakonstantinou, Vassalos). –Information Manifold (Levy). –Infomaster (Duschka, Genesereth).

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google