Fundamentals/ICY: Databases 2012/13 WEEK 9 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Fundamentals/ICY: Databases 2012/13 WEEK 9 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK

Reminder of Week 7 and 8

symmetric (1:1) recursive relationship: redundant & non-redundant implemntns 1) As previously—redundant. 2) MARRIED_V1 is just a bridging entity type: still redundant. 3) MARRIAGE together with MARPART act as a sort of bridge. Non-redundant.

Summary: Creating ERMs/ERDs uDesigning an ER model for a database is an iterative process, because, e.g.: l As you proceed, you think of new ways of conceiving what’s going on (much as in ordinary programming) l Multivalued attributes need to be re-represented eventually l M:N relationships can be included as such at an early stage, but usually need to be replaced by bridging entity types later l 1:1 relationships raise a red flag: may indicate poor design, especially when mandatory both ways. But standard in supertype/subtype representation, and natural in recursive relationships like “married-to”. l Special supertype/subtype notation needs eventually to be converted into more standard diagram notation l Conversion of ERM portions to a “Normal Form” – LATER.

MATHEMATICAL BACKGROUND TO TABLES

Mathematical “sets”: Basics uA “set” is an unordered collection of items of any sorts (people, numbers, numerals, shoes, atoms, strings of characters, databases, sets, blades of grass, …) without any duplication of items. The items are called “elements” or “members”. uS = {34, JAB, 59, UoB}, where “JAB” is a name for me and “UoB” is a name for this university, means that S is the set consisting of (exactly) the following four items: the abstract number 34, me, the abstract number 59, this university.

Basics, contd u{34, JAB, 59, UoB} = {UoB, 59, 34, JAB, JAB, 34} l Order of writing the members doesn’t matter; duplication in the writing doesn’t duplicate the member. uA set can be infinite (e.g., the set of all whole numbers). uA set can contain just one member (e.g. the set whose only element is your favourite pencil). Singleton set. l It’s different from the member itself..  uThere’s a set with no members at all: the “empty set”, usually notated as , but can also be written { }. l Somewhat analogous to zero, or a new committee which has no members yet. l There is only one empty set (rather than an empty set of numbers, an empty set of pencils, etc.)

Another Notation |> u{n | n is an integer, n > 301} = “The set of n such that n is an integer and n > 301.” (Actually, this notation is a slight simplification.) The set is the same as that denoted by, for instance,  {n | n is an integer, n  302}.

New for Week 8

Some More Examples u{JAB, “JAB”} has 2 members: me, and a 3-char string. u{3, {4,5}, 4, 6}has 4 members, one of which is a set. u{3, {5,4}, 4, 6} is that same set. u{ {4,5} } has 1 member, which is a set. {4,5} has 2 members, both numbers. u{  } has 1 member, which is the empty set. u{{  }} is a different singleton set.

Membership Relationship ua  A means that a is a member of A.  5  {4,5}  {5,4}  {3, {4,5}, 4, 6} ua  A means that a is not a member of A.  5  {3, {4,5}, 4, 6}  {5}  {3, {4,5}, 4, 6}  {4,6}  {3, {4,5}, 4, 6}  {3,4,5}  {3, {4,5}, 4, 6}

Subsets and Supersets  uA  B means that A is a “subset” of B (and that B is a “superset” of A). I.e., every member of A is also a member of B. l Carefully distinguish between subset-of and member-of !!!  l The symbol  means the same as    does NOT mean that there cannot be equality. uExamples:  l   {4,5}    {5}  {4,5,6}, {6,4}  {4,5,6,7}, {6,4,7,5}  {4,5,6,7}  {n | n is an EVEN whole number}  {n | n is a whole number}

Subsets and Supersets  u   A for any set A.  uA  A for any set A. (Reflexivity)  uIf A  B and B  A then A = B. (Antisymmetry)  uIf A  B and B  C then A  C. (Transitivity)

— Back to Database Design — NORMALIZATION

Normalization uNormalization is often used within ER modeling, to help produce a good database design. Evaluates and adjusts table structures (mainly) to minimize certain types of data redundancy, (and in some cases to avoid certain types of complexity) uSome situations require non-normalization or denormalization for efficiency reasons: Normalization generally increases the number of tables and makes many queries more elaborate (in straightforward ways, though).

Normal Forms uNormalization works through a series of stages called normal forms, giving more and more protection: l First normal form (1NF) l Second normal form (2NF) l Third normal form (3NF) l Boyce-Codd normal form (BCNF) l ((Fourth normal form (4NF) )) l Yet others! u1NF is mandatory and we have implicitly already covered it.

First Normal Form (1NF) uJust insists on some restrictions we have already explicitly or implicitly imposed on tables: l There is a candidate key whose attributes never have NULL values, and one such key has been chosen as the primary key. l There are no “repeating groups” in the table: A repeating group is a group of related rows that have some empty cells that are to be thought of as copying values from some other row. That’s my definition. More usually expressed in terms of having cells with multiple values, but I think this is misleading.

A Sample Report Layout with “repeated groups”

Another Unusual Feature of that Table uThe table has another feature that departs from DB- style tables. uWhat is it?

(Partially) Corresponding Attempt at a DB-Style Table

The Problem with Repeating Groups uQ: Why are they a problem? uA: First reason: l Rows in a DB-style table are unordered, so how do you know which row(s) to “copy” PROJ_NUM and PROJ_NAME values from/to? (Previous diagram is deceptive.) uA: Second reason: l Even if you could work out which row(s) to copy from/to, the copying would make many queries much more complex.

That Table put into 1NF (assuming there is a PK)

Dependencies and Determinants uThese concepts are needed for most of the remaining normal forms. uAny set S of attributes in a table “determines” each attribute within it, i.e.: Each attribute in S is “functionally dependent” on the whole set S. But in the following discussion of normalization… uWhen we say X is functionally dependent on S – i.e. S determines X – we will mainly be talking about non-trivial cases—cases where X is outside S (though still in the same table). uA [non-trivial] “determinant” will be a set of attributes D in a table such that it determines some attribute X outside D in the same table.

1NF can have Undesirable Dependencies u1NF tables can contain “partial,” “transitive” and other generally undesirable functional dependencies of an attribute X on a determinant D. uBy “undesirable” I will mean mainly that the determinant D is not a superkey, so that at least one attribute Y in the table is not determined by D, so Y can have different values in the table for equal D values, so redundancy (repetition of the association between D and X values) can arise.

Partial and Transitive Dependencies

1NF can have Partial Dependencies uPartial dependency: where the determinant is part but not all of the primary key (and NB: is therefore not a superkey) The determined attribute X is necessarily outside the whole PK—exercise: why?

Second Normal Form uA table is in second normal form (2NF) if: l It is in 1NF and l It includes no partial dependencies

Conversion to 2NF uFor each determinant D involved in a partial dependency in the original table T, use D as the PK for a new table NT(D) and move out the attributes X determined by D into NT(D). uD itself stays in T as well as being copied into NT(D).

Reminder: Partial and Transitive Dependencies

Second Normal Form (2NF) Conversion results on example on previous slide

But 2NF can still have Undesirable Dependencies uA prime attribute is one that is within some candidate key (not necessarily the primary key). uA transitive dependency is where the determinant D is at least partially outside the PK and is not a superkey, and the determined attribute X is non-prime (and therefore in particular is not inside the PK; the reason for this restriction is on a later slide). l E.g.: previous Figure for simple case of a simple (= one- attribute) determinant. l Above definition is partly based on Garcia-Molina, Ullman & Widom 2009 – see later ref. More general than the account in R&C and R,C&C.

Third Normal Form uA table is in third normal form (3NF) if: l It is in 2NF and l It contains no transitive dependencies

Conversion to 3NF uFor each determinant D involved in a transitive dependency in the original table T, use D as the PK for a new table NT(D) and move out the attributes X transitively determined by D into NT(D). uNB: the determinants themselves stay in T as well.

In 2NF but not in 3NF

Third Normal Form (3NF) Conversion Results on previous example

Fundamentals/ICY: Databases 2012/13 WEEK 9 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Similar presentations

Presentation on theme: "Fundamentals/ICY: Databases 2012/13 WEEK 9 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fundamentals/ICY: Databases 2012/13 WEEK 9 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK.

Similar presentations

Presentation on theme: "Fundamentals/ICY: Databases 2012/13 WEEK 9 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK."— Presentation transcript:

Similar presentations

About project

Feedback