© D. Wong 2003 1 Normalization  Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies.  Decompose relation.

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Advertisements

SQL This presentation will cover: A Brief History of DBMS View in database MySQL installation.
Database Modifications CIS 4301 Lecture Notes Lecture /30/2006.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Algebraic and Logical Query Languages Spring 2011 Instructor: Hassan Khosravi.
Midterm Review II. Redundancy. –Information may be repeated unnecessarily in several tuples. –E.g. length and filmType. Update anomalies. –We may change.
Subqueries Example Find the name of the producer of ‘Star Wars’.
Instructor: Amol Deshpande  Data Models ◦ Conceptual representation of the data  Data Retrieval ◦ How to ask questions of the database.
The Relational Model System Development Life Cycle Normalisation
Query Compiler By:Payal Gupta Roll No:106(225) Professor :Tsau Young Lin.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 8 A First Course in Database Systems.
SQL. 1.SQL is a high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation details that would be necessary in.
CMSC424: Database Design Instructor: Amol Deshpande
Nov 24, 2003Murali Mani SQL B term 2004: lecture 12.
CMSC424: Database Design Instructor: Amol Deshpande
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
SQL SQL is a very-high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation details that would be necessary in.
Database Modifications A modification command does not return a result as a query does, but it changes the database in some way. There are three kinds.
Joins Natural join is obtained by: R NATURAL JOIN S; Example SELECT * FROM MovieStar NATURAL JOIN MovieExec; Theta join is obtained by: R JOIN S ON Example.
CMSC424: Database Design Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande
SQL By: Toan Nguyen. Download Download the software at During the installation –Skip sign up for fast installation.
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD.
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Introduction to Data Manipulation in SQL CIS 4301 Lecture Notes Lecture /03/2006.
Introduction to Indexes. Indexes An index on an attribute A of a relation is a data structure that makes it efficient to find those tuples that have a.
© D. Wong 2002 © D. Wong CS610 / CS710 Database Systems I Daisy Wong.
Database Management COP4540, SCS, FIU Structured Query Language (Chapter 8)
THE DATABASE LANGUAGE SQL
1 Lecture 7: Normal Forms, Relational Algebra Monday, 10/15/2001.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
1 More SQL uDatabase Modification uDefining a Database Schema uViews.
Referential Integrity checks, Triggers and Assertions Examples from Chapter 7 of Database Systems: the Complete Book Garcia-Molina, Ullman, & Widom.
Advanced SQL Concepts - Checking of Constraints CIS 4301 Lecture Notes Lecture /6/2006.
Design Process - Where are we?
1 Algebra of Queries Classical Relational Algebra It is a collection of operations on relations. Each operation takes one or two relations as its operand(s)
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
CS 157B Database Systems Dr. T Y Lin. Updates 1.Red color denotes updated data (ppt) 2.Class participation will be part of “extra” credits to to “quiz.
3 Spring Chapter Normalization of Database Tables.
SQL Exercises – Part I April
Ch 7: Normalization-Part 1
The Relational Model of Data Prof. Yin-Fu Huang CSIE, NYUST Chapter 2.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 18 A First Course in Database Systems.
The Database Language SQL Prof. Yin-Fu Huang CSIE, NYUST Chapter 6.
CS 157B Database Systems Dr. T Y Lin. 1.2 Overview of a Database Management System Data-Definition Language Commands –Illustrated by three examples.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Final Review Zaki Malik November 20, Basic Operators Covered.
Databases : SQL Multi-Relations 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof. Jeffrey D. Ullman.
1 Constraints and Triggers in SQL. 2 Constraints are conditions that must hold on all valid relation instances SQL2 provides a variety of techniques for.
4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.
Subqueries CIS 4301 Lecture Notes Lecture /23/2006.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Chap 5. The DB Language (SQL)
Introduction to Structured Query Language (SQL)
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
THE RELATIONAL MODEL OF DATA
Quiz Questions Q.1 An entity set that does not have sufficient attributes to form a primary key is a (A) strong entity set. (B) weak entity set. (C) simple.
Chap 2. The Relational Model of Data
3.1 Functional Dependencies
Normal forms First Normal Form (1NF) Second Normal Form (2NF)
Mulitvalued Dependencies
CSCE 315 – Programming Studio Spring 2010 Project 1, Lecture 4
SQL This presentation will cover: View in database MySQL installation
Contents Preface I Introduction Lesson Objectives I-2
SQL – Constraints & Triggers
Query Compiler By:Payal Gupta Shirali Choksi Professor :Tsau Young Lin.
Presentation transcript:

© D. Wong Normalization  Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies.  Decompose relation schema into Normal forms: –Boyce-Codd Normal Form (BCNF) –Third Normal Form (3NF) –Fourth Normal Form (4NF)  To obtain the new relations, project the schemas onto the original relation schema (e.g. Movie)  To recover information (I.e. Movie) from the new relations: natural join the new relations.

© D. Wong BCNF Decomposition Example 3.24 pp 104  Relation: Movie(title, year, length, filmType, studioName, starName)  Key: {title, year, starName}  FD’s: title year  length filmType studioName is a BCNF violation, so Movie not in BCNF  Decomposition: Schema 1: {title, year, length, filmType, studioName} Schema 2: {title, year, starName}  To obtain the new relations, project the schemas onto Movie  To recover information (I.e. Movie) from the new relations: natural join the new relations. Does not lose information.

© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally determines Y e.g. A 1 A 2 …A n  B 1 B 2 …B m  A 1 A 2 …A n  BB…B is an assertion about R that two attributes or sets of attributes in R are dependent of one another.  A 1 A 2 …A n  B 1 B 2 …B m is an assertion about R that two attributes or sets of attributes in R are dependent of one another.

© D. Wong Mutivalued Dependencies (MVD)  Given: relation schema R, and A 1 A 2 …A n and BB…B be subsets of attributes of R.  Given: relation schema R, and A 1 A 2 …A n and B 1 B 2 …B m be subsets of attributes of R. MVD : A 1 A 2 …A n  BB…B holds in R if : MVD : A 1 A 2 …A n  B 1 B 2 …B m holds in R if : For each pair of tuples t and u of relation R that agree on all the A’s, we can find in R some tuple v that agrees: 1.With both t and u on the A’s, 2.With t on the B’s, and 3.With u on all attributes of R that are not among the A’s or B’s  A 1 A 2 …A n  BB…B is an assertion about R that two attributes or sets of attributes in R are independent of one another.  A 1 A 2 …A n  B 1 B 2 …B m is an assertion about R that two attributes or sets of attributes in R are independent of one another.  Cause redundancy not related to FD’s in a BCNF schema.  Most common source: putting 2 or more many-many relationships in a single relation.

© D. Wong MVD Rules  Trivial dependencies rule If A 1 A 2 …A n  BB…BA 1 A 2 …A n  CC…C If A 1 A 2 …A n  B 1 B 2 …B m holds for R, then A 1 A 2 …A n  C 1 C 2 …C k holds where the C’s are the B’s + one or more of the A’s. The converse also hold.   Transitive rule A 1 A 2 …A n  BB…BB 1 B 2 …B m  CC…C A 1 A 2 …A n  CC…C If A 1 A 2 …A n  B 1 B 2 …B m and B 1 B 2 …B m  C 1 C 2 …C k then A 1 A 2 …A n  C 1 C 2 …C k   Splitting rule does not hold  street city, but not name  street E.g. name  street city, but not name  street So, always start with set of attributes on the R.S. because splitting rule does not hold.

© D. Wong More MVD Rules  Every FD is an MVD Because If FD A 1 A 2 …A n  BB…B, then swapping B’s between tuples that agree on A’s doesn’t create new tuples. Because If FD A 1 A 2 …A n  B 1 B 2 …B m, then swapping B’s between tuples that agree on A’s doesn’t create new tuples.  Complementation rule If X  Y, then X  Z, where Z is all attributes not in X or Y e.g. Star_Star_In {name, street, city, title, year} name  street city name  street city name  title year name  title year A’s B’s t u

© D. Wong Nontrivial MVD A 1 A 2 …A n  BB…B A 1 A 2 …A n  B 1 B 2 …B m for a relation R is nontrivial if: 1. BB…BA 1 A 2 …A n 1. B 1 B 2 …B m is not a subset of A 1 A 2 …A n 2. A 1 A 2 …A n  BB…B 2. A 1 A 2 …A n  B 1 B 2 …B m is not all attributes of R

© D. Wong Fourth Normal Form (4NF)  Decompose relations that has MVD’s into 4NF to eliminate MVD’s.  Definition: R is in 4NF if A 1 A 2 …A n  BB…B A 1 A 2 …A n } is a superkey. R is in 4NF if A 1 A 2 …A n  B 1 B 2 …B m is a nontrivial MVD, {A 1 A 2 …A n } is a superkey.  every FD is an MVD, so 4NF is more stringent than BCNF  Since every FD is an MVD, so 4NF is more stringent than BCNF   Only nontrivial MVD’s has the potential to violate 4NF

© D. Wong NF Decomposition Given: relation R, and nontrivial MVD X  Y that violate 4NF 1. Decompose X  Y into XY and X  (R-Y) 2. Produce the relations by projecting R onto XY and X  (R-Y) 3. Reconstruct R from the new relations using natural join e.g. Star_Star_In {name, street, city, title, year} and name  street city Decompose Star_Star_In using name  street city into {name, street, city} and {name, title, year} X Y R

© D. Wong Relationships among normal forms  4NF is the most stringent  4NF  BCNF  3NF

© D. Wong Lossless-join decomposition Given: Relation R, decomposed into schemes R 1, R 2, … R k, and D is a set of dependencies. Definition: R 1, R 2, … R k is a lossless-join (w.r.t. D) if for every relation r for R satisfying D: r =  R1 (r)  R2 (r)  Rk (r) r =  R1 (r)  R2 (r) …  Rk (r) i.e. Every relation r for R is the natural join of its projections onto the R i ’s. The lossless-join property is necessary if the decomposed relation is to be recoverable from its decomposition. However, joins are expensive. So, don’t over decompose!

© D. Wong Structured Query Language (SQL)  A DDL and DML for relational DBMSs  History: ANSI SQL,, SQL-92 (SQL2), SQL-99 (SQL3)  SQL-99 extends SQL2 with object-relational features and other new features  Most DBMS vendors implements the core, and then add bells and whistles and variations  Query capability is close to relational algebra, with lots of extensions.  Case insensitive except characters inside quoted strings ' ' e.g. 'Smith'  'SMITH'  ; as statement delimiter

© D. Wong Example database schema Movie(title, year, length, inColor, studioName, producerC#) StartIn(movieTitle, movieYear, starName) MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#)

© D. Wong SQL Quries – basic form SELECT attribute/s FROM relations / views /subqury WHERE conditional expression;

© D. Wong SQL query examples 1. Example 1: SELECT * FROM Movie; -- * => all attributes of Movie 2. Example 2: SELECT * FROM Movie WHERE studioName = 'Disney' AND year = 1990; 3. Example 3: SELECT title, length FROM Movie WHERE studioName = 'Disney' AND year = 1990;

© D. Wong Duplicates  SQL generally operates using bags instead of sets  Exception: UNION, INTERSECT, EXCEPT operation  To eliminate duplicates, add keyword DISTINCT to the SELECT clause e.g. SELECT DISTINCT starName FROM StarsIn; FROM StarsIn;  Duplicate elimination is costly. Use judiciously.

© D. Wong SQL Correspondence to Relational Algebra SELECT L --  R.A. project FROM R--  R.A. operands WHERE C;--  R.A. select R.A. expression:  L (  C (R)) R.A. expression:  L (  C (R)) When reading and writing queries: 1. FROM -- what relations are involved 2. WHERE-- what's the tuples selection criteria 3. SELECT-- what columns to output

© D. Wong Union, Intersection, Difference of Queries  UNION : R1 UNION R2 or (Q1) UNION (Q2) e.g. (SELECT title, year FROM Movie) UNION (SELECT movieTitle AS title, movieYear AS year FROM StarsIn); (SELECT movieTitle AS title, movieYear AS year FROM StarsIn);  INTERSECT : R1 INTERSECT R2 or (Q1) INTERSECT (Q2) (Q1) INTERSECT (Q2)  EXCEPT: R1 EXCEPT R2-- difference (Q1) EXCEPT (Q2) (Q1) EXCEPT (Q2)

© D. Wong Union, Intersection, Difference of Queries (continued)  Q1 and Q2 are queries that produce relations  R1 and R2, or results of Q1 and Q2 should have the same list of attributes and attribute types. Rename if necessary.  Duplicates are eliminated automatically  Add the keyword ALL after UNION, INTERSECT, or EXCEPT to prevent duplicates elimination

© D. Wong SQL and Relational Algebra  The six independent operations are implemented by SQL  SQL is relational complete

© D. Wong Some data values in SQL 1. Strings 2. Dates and Times 3. Null values 4. Truth value of Unknown

© D. Wong Strings  Comparison operators (according to lexicographical order), = =  LIKE -- pattern matching  % -- matches any sequence of 0 or more characters  _ -- matches any one character  E.g.: title LIKE 'Star _ _ _ _'  E.g.: title LIKE '%''s%'  Can specify escape character  E.g. title LIKE 'x%x%' ESCAPE 'x'

© D. Wong Dates and Times  Date constant: DATE ' '  Time constant: TIME '15:00:02.5'  Timestamp (combines dates and times): TIMESTAMP ' :00:02.5‘ (beware of implementation differences!)  Comparison operators apply

© D. Wong Null Values  NULL to represent: 1.Value unknown 2.Value inapplicable 3.Value withheld  Operations involving NULL 1.Arithmetic operation: result is NULL 2.Comparison: result is UNKNOWN  NULL is not a constant, therefore NULL cannot be used explicitly as an operand.  IS NULL and IS NOT NULL checks  Read "Pitfalls Regarding Nulls" pp. 250

© D. Wong UNKNOWN  Consider TRUE = 1, FALSE = 0, UNKNOWN = AND of 2 truth-value = min. of the 2 values 2.OR of 2 truth-value = max. of the 2 values 3.Negation of v = 1-v  Refer Fig. 6.2 pp. 250 for truth table for 3-valued logic

© D. Wong The Six Clauses in SQL Queries 1. SELECT-- required 2. FROM-- required 3. WHERE 4. GROUP BY 5. HAVING-- if used, must follows a group by clause 6. ORDER BY  Subqueries may appear in the FROM clause and the WHERE clause  Comments begins with ‘--’

© D. Wong Table level SQL (ref. 6.6, pp. 292)  Create table – to define the schema of a base table (Ref for data types syntax) E.g. create table EMP ( empno int not null, empno int not null, lastName varchar(30) not null, lastName varchar(30) not null, firstName varchar(30) not null, firstName varchar(30) not null, num_of_children int, num_of_children int, constraint pk_EMP primary key (empno) constraint pk_EMP primary key (empno));  Drop table – to destroy a base table e.g. drop table EMP;

© D. Wong Tuple Modification Statements (ref. 6.5, pp. 286)  Insert – to add a row Syntax: insert into R(A 1..A n ) values (v 1 …v n ) –E.g. insert into emp( empno, lastName, firstName, num_of_children) values (12345, ‘Doe’, ‘John’, 1) –Or insert into emp values (12345, ‘Doe’, ‘John’, 1)  Delete – to remove a row Syntax: delete from R where Syntax: delete from R where –E.g. delete from emp where empno =  Update – to modify the contents of a row Syntax: update R set A i = value where A j = targetValue –E.g. update emp set num_of_children = 2 where empno = 12345

© D. Wong Some JOINS in SQL. (ref. pp. 270)  CROSS JOIN--  R.A. cartesian product e.g. Movie CROSS JOIN StarsIn;  JOIN … ON--  R.A. theta-join e.g. Movie JOIN StarsIn ON title = movieTitle AND year = movieYear;  [NATURAL] JOIN--  R.A. natural join e.g. MovieStar NATURAL JOIN MovieExec; or MovieStar JOIN MovieExec; MovieStar JOIN MovieExec;  OUTERJOINS-- joins that include dangling tuples

© D. Wong OUTERJOINS  An operator to augment the result of a join by the dangling tuples, padded with null values.  Full outerjoin of R1 and R2 is a join that includes all rows from R1 and R2 matched or not. Unmatched rows are padded with NULLs.  LEFT outerjoin of R1 and R2 is a join that includes all rows from R1, matched or not, plus the matching values from R2. Unmatched rows are padded with NULLs.  RIGHT outerjoin of R1 and R2 is a join that includes all rows from R2, matched or not, plus the matching values from R1. Unmatched rows are padded with NULLs.  The joining may be NATURAL or theta join

© D. Wong Outerjoins Syntax 1. R1 NATURAL {FULL | LEFT | RIGHT} OUTER JOIN R2; E.g. 1. MovieStar NATURAL FULL OUTER JOIN MovieExec; E.g. 2. MovieStar NATURAL LEFT OUTER JOIN MovieExec; E.g. 3. MovieStar NATURAL RIGHT OUTER JOIN MovieExec;

© D. Wong Outerjoins Syntax (continued) 1. R1 {FULL | LEFT | RIGHT} OUTER JOIN R2 ON conditional expression; E.g. 1. Movie FULL OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 2. MovieStar LEFT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 3. MovieStar RIGHT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear;

© D. Wong Use result of joins as subqueries in queries  E.g. SELECT title, year, length, inColor, studioName, producerC#, starName FROM Movie JOIN StarsIn ON title = movieTitle AND year = movieYear;