Presentation is loading. Please wait.

Presentation is loading. Please wait.

© D. Wong 2003 1 Normalization  Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies.  Decompose relation.

Similar presentations


Presentation on theme: "© D. Wong 2003 1 Normalization  Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies.  Decompose relation."— Presentation transcript:

1 © D. Wong 2003 1 Normalization  Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies.  Decompose relation schema into Normal forms: –Boyce-Codd Normal Form (BCNF) –Third Normal Form (3NF) –Fourth Normal Form (4NF)  To obtain the new relations, project the schemas onto the original relation schema (e.g. Movie)  To recover information (I.e. Movie) from the new relations: natural join the new relations.

2 © D. Wong 2003 2 BCNF Decomposition Example 3.24 pp 104  Relation: Movie(title, year, length, filmType, studioName, starName)  Key: {title, year, starName}  FD’s: title year  length filmType studioName is a BCNF violation, so Movie not in BCNF  Decomposition: Schema 1: {title, year, length, filmType, studioName} Schema 2: {title, year, starName}  To obtain the new relations, project the schemas onto Movie  To recover information (I.e. Movie) from the new relations: natural join the new relations. Does not lose information.

3 © D. Wong 2003 3 Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally determines Y e.g. A 1 A 2 …A n  B 1 B 2 …B m  A 1 A 2 …A n  BB…B is an assertion about R that two attributes or sets of attributes in R are dependent of one another.  A 1 A 2 …A n  B 1 B 2 …B m is an assertion about R that two attributes or sets of attributes in R are dependent of one another.

4 © D. Wong 2003 4 Mutivalued Dependencies (MVD)  Given: relation schema R, and A 1 A 2 …A n and BB…B be subsets of attributes of R.  Given: relation schema R, and A 1 A 2 …A n and B 1 B 2 …B m be subsets of attributes of R. MVD : A 1 A 2 …A n  BB…B holds in R if : MVD : A 1 A 2 …A n  B 1 B 2 …B m holds in R if : For each pair of tuples t and u of relation R that agree on all the A’s, we can find in R some tuple v that agrees: 1.With both t and u on the A’s, 2.With t on the B’s, and 3.With u on all attributes of R that are not among the A’s or B’s  A 1 A 2 …A n  BB…B is an assertion about R that two attributes or sets of attributes in R are independent of one another.  A 1 A 2 …A n  B 1 B 2 …B m is an assertion about R that two attributes or sets of attributes in R are independent of one another.  Cause redundancy not related to FD’s in a BCNF schema.  Most common source: putting 2 or more many-many relationships in a single relation.

5 © D. Wong 2003 5 MVD Rules  Trivial dependencies rule If A 1 A 2 …A n  BB…BA 1 A 2 …A n  CC…C If A 1 A 2 …A n  B 1 B 2 …B m holds for R, then A 1 A 2 …A n  C 1 C 2 …C k holds where the C’s are the B’s + one or more of the A’s. The converse also hold.   Transitive rule A 1 A 2 …A n  BB…BB 1 B 2 …B m  CC…C A 1 A 2 …A n  CC…C If A 1 A 2 …A n  B 1 B 2 …B m and B 1 B 2 …B m  C 1 C 2 …C k then A 1 A 2 …A n  C 1 C 2 …C k   Splitting rule does not hold  street city, but not name  street E.g. name  street city, but not name  street So, always start with set of attributes on the R.S. because splitting rule does not hold.

6 © D. Wong 2003 6 More MVD Rules  Every FD is an MVD Because If FD A 1 A 2 …A n  BB…B, then swapping B’s between tuples that agree on A’s doesn’t create new tuples. Because If FD A 1 A 2 …A n  B 1 B 2 …B m, then swapping B’s between tuples that agree on A’s doesn’t create new tuples.  Complementation rule If X  Y, then X  Z, where Z is all attributes not in X or Y e.g. Star_Star_In {name, street, city, title, year} name  street city name  street city name  title year name  title year A’s B’s t u

7 © D. Wong 2003 7 Nontrivial MVD A 1 A 2 …A n  BB…B A 1 A 2 …A n  B 1 B 2 …B m for a relation R is nontrivial if: 1. BB…BA 1 A 2 …A n 1. B 1 B 2 …B m is not a subset of A 1 A 2 …A n 2. A 1 A 2 …A n  BB…B 2. A 1 A 2 …A n  B 1 B 2 …B m is not all attributes of R

8 © D. Wong 2003 8 Fourth Normal Form (4NF)  Decompose relations that has MVD’s into 4NF to eliminate MVD’s.  Definition: R is in 4NF if A 1 A 2 …A n  BB…B A 1 A 2 …A n } is a superkey. R is in 4NF if A 1 A 2 …A n  B 1 B 2 …B m is a nontrivial MVD, {A 1 A 2 …A n } is a superkey.  every FD is an MVD, so 4NF is more stringent than BCNF  Since every FD is an MVD, so 4NF is more stringent than BCNF   Only nontrivial MVD’s has the potential to violate 4NF

9 © D. Wong 2003 9 4NF Decomposition Given: relation R, and nontrivial MVD X  Y that violate 4NF 1. Decompose X  Y into XY and X  (R-Y) 2. Produce the relations by projecting R onto XY and X  (R-Y) 3. Reconstruct R from the new relations using natural join e.g. Star_Star_In {name, street, city, title, year} and name  street city Decompose Star_Star_In using name  street city into {name, street, city} and {name, title, year} X Y R

10 © D. Wong 2003 10 Relationships among normal forms  4NF is the most stringent  4NF  BCNF  3NF

11 © D. Wong 2003 11 Lossless-join decomposition Given: Relation R, decomposed into schemes R 1, R 2, … R k, and D is a set of dependencies. Definition: R 1, R 2, … R k is a lossless-join (w.r.t. D) if for every relation r for R satisfying D: r =  R1 (r)  R2 (r)  Rk (r) r =  R1 (r)  R2 (r) …  Rk (r) i.e. Every relation r for R is the natural join of its projections onto the R i ’s. The lossless-join property is necessary if the decomposed relation is to be recoverable from its decomposition. However, joins are expensive. So, don’t over decompose!

12 © D. Wong 2003 12 Structured Query Language (SQL)  A DDL and DML for relational DBMSs  History: ANSI SQL,, SQL-92 (SQL2), SQL-99 (SQL3)  SQL-99 extends SQL2 with object-relational features and other new features  Most DBMS vendors implements the core, and then add bells and whistles and variations  Query capability is close to relational algebra, with lots of extensions.  Case insensitive except characters inside quoted strings ' ' e.g. 'Smith'  'SMITH'  ; as statement delimiter

13 © D. Wong 2003 13 Example database schema Movie(title, year, length, inColor, studioName, producerC#) StartIn(movieTitle, movieYear, starName) MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#)

14 © D. Wong 2003 14 SQL Quries – basic form SELECT attribute/s FROM relations / views /subqury WHERE conditional expression;

15 © D. Wong 2003 15 SQL query examples 1. Example 1: SELECT * FROM Movie; -- * => all attributes of Movie 2. Example 2: SELECT * FROM Movie WHERE studioName = 'Disney' AND year = 1990; 3. Example 3: SELECT title, length FROM Movie WHERE studioName = 'Disney' AND year = 1990;

16 © D. Wong 2003 16 Duplicates  SQL generally operates using bags instead of sets  Exception: UNION, INTERSECT, EXCEPT operation  To eliminate duplicates, add keyword DISTINCT to the SELECT clause e.g. SELECT DISTINCT starName FROM StarsIn; FROM StarsIn;  Duplicate elimination is costly. Use judiciously.

17 © D. Wong 2003 17 SQL Correspondence to Relational Algebra SELECT L --  R.A. project FROM R--  R.A. operands WHERE C;--  R.A. select R.A. expression:  L (  C (R)) R.A. expression:  L (  C (R)) When reading and writing queries: 1. FROM -- what relations are involved 2. WHERE-- what's the tuples selection criteria 3. SELECT-- what columns to output

18 © D. Wong 2003 18 Union, Intersection, Difference of Queries  UNION : R1 UNION R2 or (Q1) UNION (Q2) e.g. (SELECT title, year FROM Movie) UNION (SELECT movieTitle AS title, movieYear AS year FROM StarsIn); (SELECT movieTitle AS title, movieYear AS year FROM StarsIn);  INTERSECT : R1 INTERSECT R2 or (Q1) INTERSECT (Q2) (Q1) INTERSECT (Q2)  EXCEPT: R1 EXCEPT R2-- difference (Q1) EXCEPT (Q2) (Q1) EXCEPT (Q2)

19 © D. Wong 2003 19 Union, Intersection, Difference of Queries (continued)  Q1 and Q2 are queries that produce relations  R1 and R2, or results of Q1 and Q2 should have the same list of attributes and attribute types. Rename if necessary.  Duplicates are eliminated automatically  Add the keyword ALL after UNION, INTERSECT, or EXCEPT to prevent duplicates elimination

20 © D. Wong 2003 20 SQL and Relational Algebra  The six independent operations are implemented by SQL  SQL is relational complete

21 © D. Wong 2003 21 Some data values in SQL 1. Strings 2. Dates and Times 3. Null values 4. Truth value of Unknown

22 © D. Wong 2003 22 1. Strings  Comparison operators (according to lexicographical order), = =  LIKE -- pattern matching  % -- matches any sequence of 0 or more characters  _ -- matches any one character  E.g.: title LIKE 'Star _ _ _ _'  E.g.: title LIKE '%''s%'  Can specify escape character  E.g. title LIKE 'x%x%' ESCAPE 'x'

23 © D. Wong 2003 23 2. Dates and Times  Date constant: DATE '2002-10-01'  Time constant: TIME '15:00:02.5'  Timestamp (combines dates and times): TIMESTAMP '2002-10-01 15:00:02.5‘ (beware of implementation differences!)  Comparison operators apply

24 © D. Wong 2003 24 3. Null Values  NULL to represent: 1.Value unknown 2.Value inapplicable 3.Value withheld  Operations involving NULL 1.Arithmetic operation: result is NULL 2.Comparison: result is UNKNOWN  NULL is not a constant, therefore NULL cannot be used explicitly as an operand.  IS NULL and IS NOT NULL checks  Read "Pitfalls Regarding Nulls" pp. 250

25 © D. Wong 2003 25 4. UNKNOWN  Consider TRUE = 1, FALSE = 0, UNKNOWN = 0.5 1.AND of 2 truth-value = min. of the 2 values 2.OR of 2 truth-value = max. of the 2 values 3.Negation of v = 1-v  Refer Fig. 6.2 pp. 250 for truth table for 3-valued logic

26 © D. Wong 2003 26 The Six Clauses in SQL Queries 1. SELECT-- required 2. FROM-- required 3. WHERE 4. GROUP BY 5. HAVING-- if used, must follows a group by clause 6. ORDER BY  Subqueries may appear in the FROM clause and the WHERE clause  Comments begins with ‘--’

27 © D. Wong 2003 27 Table level SQL (ref. 6.6, pp. 292)  Create table – to define the schema of a base table (Ref. 6.6.1 for data types syntax) E.g. create table EMP ( empno int not null, empno int not null, lastName varchar(30) not null, lastName varchar(30) not null, firstName varchar(30) not null, firstName varchar(30) not null, num_of_children int, num_of_children int, constraint pk_EMP primary key (empno) constraint pk_EMP primary key (empno));  Drop table – to destroy a base table e.g. drop table EMP;

28 © D. Wong 2003 28 Tuple Modification Statements (ref. 6.5, pp. 286)  Insert – to add a row Syntax: insert into R(A 1..A n ) values (v 1 …v n ) –E.g. insert into emp( empno, lastName, firstName, num_of_children) values (12345, ‘Doe’, ‘John’, 1) –Or insert into emp values (12345, ‘Doe’, ‘John’, 1)  Delete – to remove a row Syntax: delete from R where Syntax: delete from R where –E.g. delete from emp where empno = 12345  Update – to modify the contents of a row Syntax: update R set A i = value where A j = targetValue –E.g. update emp set num_of_children = 2 where empno = 12345

29 © D. Wong 2003 29 Some JOINS in SQL. (ref. pp. 270)  CROSS JOIN--  R.A. cartesian product e.g. Movie CROSS JOIN StarsIn;  JOIN … ON--  R.A. theta-join e.g. Movie JOIN StarsIn ON title = movieTitle AND year = movieYear;  [NATURAL] JOIN--  R.A. natural join e.g. MovieStar NATURAL JOIN MovieExec; or MovieStar JOIN MovieExec; MovieStar JOIN MovieExec;  OUTERJOINS-- joins that include dangling tuples

30 © D. Wong 2003 30 OUTERJOINS  An operator to augment the result of a join by the dangling tuples, padded with null values.  Full outerjoin of R1 and R2 is a join that includes all rows from R1 and R2 matched or not. Unmatched rows are padded with NULLs.  LEFT outerjoin of R1 and R2 is a join that includes all rows from R1, matched or not, plus the matching values from R2. Unmatched rows are padded with NULLs.  RIGHT outerjoin of R1 and R2 is a join that includes all rows from R2, matched or not, plus the matching values from R1. Unmatched rows are padded with NULLs.  The joining may be NATURAL or theta join

31 © D. Wong 2003 31 Outerjoins Syntax 1. R1 NATURAL {FULL | LEFT | RIGHT} OUTER JOIN R2; E.g. 1. MovieStar NATURAL FULL OUTER JOIN MovieExec; E.g. 2. MovieStar NATURAL LEFT OUTER JOIN MovieExec; E.g. 3. MovieStar NATURAL RIGHT OUTER JOIN MovieExec;

32 © D. Wong 2003 32 Outerjoins Syntax (continued) 1. R1 {FULL | LEFT | RIGHT} OUTER JOIN R2 ON conditional expression; E.g. 1. Movie FULL OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 2. MovieStar LEFT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 3. MovieStar RIGHT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear;

33 © D. Wong 2003 33 Use result of joins as subqueries in queries  E.g. SELECT title, year, length, inColor, studioName, producerC#, starName FROM Movie JOIN StarsIn ON title = movieTitle AND year = movieYear;


Download ppt "© D. Wong 2003 1 Normalization  Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies.  Decompose relation."

Similar presentations


Ads by Google