Midterm Review II. Redundancy. –Information may be repeated unnecessarily in several tuples. –E.g. length and filmType. Update anomalies. –We may change.

Slides:



Advertisements
Similar presentations
SQL CSET 3300.
Advertisements

Algebraic and Logical Query Languages Spring 2011 Instructor: Hassan Khosravi.
Functional Dependencies - Example
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Database Systems Relations as Bags Grouping and Aggregation Database Modification.
Relational Algebra.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Relational Operations on Bags Extended Operators of Relational Algebra.
Functional Dependencies
Functional Dependencies. Babies At a birth, there is one baby (twins would be represented by two births), one mother, any number of nurses, and a doctor.
Subqueries Example Find the name of the producer of ‘Star Wars’.
Instructor: Amol Deshpande  Data Models ◦ Conceptual representation of the data  Data Retrieval ◦ How to ask questions of the database.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 8 A First Course in Database Systems.
Oct 28, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
Relational Algebra on Bags A bag is like a set, but an element may appear more than once. –Multiset is another name for “bag.” Example: {1,2,1,3} is a.
Closure The closure of {B 1 …B k } under the set of FDs S, denoted by {B 1 …B k } +, is defined as follows: {B 1 …B k } + = {B | any relation satisfies.
SQL. 1.SQL is a high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation details that would be necessary in.
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #3.
Operations in the Relational Model These operation can be expressed in an algebra, called “relational algebra”. In this algebra relations are the operands.
Relational Operations on Bags Extended Operators of Relational Algebra.
SQL SQL is a very-high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation details that would be necessary in.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Joins Natural join is obtained by: R NATURAL JOIN S; Example SELECT * FROM MovieStar NATURAL JOIN MovieExec; Theta join is obtained by: R JOIN S ON Example.
Chapter 5 Algebraic and Logical Query Languages pp.54 is added Pp 61 updated.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
CMSC424: Database Design Instructor: Amol Deshpande
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
1 More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
Nov 18, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
SQL By: Toan Nguyen. Download Download the software at During the installation –Skip sign up for fast installation.
Murali Mani Relational Algebra. Murali Mani What is Relational Algebra? Defines operations (data retrieval) for relational model SQL’s DML (Data Manipulation.
Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2.
Design Theory for Relational Databases 2015, Fall Pusan National University Ki-Joune Li.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
From Professor Ullman, Relational Algebra.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
© D. Wong Normalization  Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies.  Decompose relation.
Databases : Relational Algebra - Complex Expression 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof.
More Relation Operations 2015, Fall Pusan National University Ki-Joune Li.
More Relation Operations 2014, Fall Pusan National University Ki-Joune Li.
Operations in the Relational Model COP 4720 Lecture 8 Lecture Notes.
1 Algebra of Queries Classical Relational Algebra It is a collection of operations on relations. Each operation takes one or two relations as its operand(s)
1 CSCE Database Systems Anxiao (Andrew) Jiang The Database Language SQL.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 10 A First Course in Database Systems.
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
3 Spring Chapter Normalization of Database Tables.
More SQL (and Relational Algebra). More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
CPSC 603 Database Systems Lecturer: Laurie Webster II, M.S.S.E., M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 5 Introduction to a First Course in Database Systems.
The Database Language SQL Prof. Yin-Fu Huang CSIE, NYUST Chapter 6.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
© D. Wong Ch. 3 (part 1)  Relational Model basics  From E/R diagram to Relations.
Subqueries CIS 4301 Lecture Notes Lecture /23/2006.
1 Introduction to Database Systems, CS420 SQL JOIN, Aggregate, Grouping, HAVING and DML Clauses.
Outerjoins, Grouping/Aggregation Insert/Delete/Update
Databases : More about SQL
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
3.1 Functional Dependencies
Database Design and Programming
Operators Expression Trees Bag Model of Data
Design Theory for Relational Databases
Functional Dependencies and Normalization
More Relation Operations
Algebraic and Logical Query Languages pp.54 is added
More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation
Design Theory for Relational Databases
Presentation transcript:

Midterm Review II

Redundancy. –Information may be repeated unnecessarily in several tuples. –E.g. length and filmType. Update anomalies. –We may change information in one tuple but leave it unchanged in other tuples. –E.g. we could change the length of Star Wars to 125, in the first tuple, and forget to do the same in the second and third tuple. Deletion anomalies. –If a set of values becomes empty, we may lose other information as a side effect. –E.g. if we delete Emilio Estevez we will lose all the information about Mighty Ducks. Anomalies Mike MeyersParamountcolor951992Wayne’s World Dana CarveyParamountcolor951992Wayne’s World Emilio EstevezDisneycolor Mighty Ducks Harrison FordFoxcolor Star Wars Mark HamillFoxcolor Star Wars Carrie FisherFoxcolor Star Wars starNamestudioNamfilmTyplengthyeartitle

Decomposing Relations - Example Star Wars title Wayne’s World Foxcolor studioNamfilmTyplengthyear Paramountcolor Disneycolor Mighty Ducks Wayne’s World Mighty Ducks Star Wars title Mike Meyers Dana Carvey Emilio Estevez Harrison Ford Mark Hamill Carrie Fisher starNameyear No true redundancy! The update anomaly disappeared. If we change the length of a movie, it is done only once. The deletion anomaly disappeared. If we delete all the stars from Movie 2 we still will have the other info for a movie. Movie 1 relation Movie 2 relation

Boyce-Codd Normal Form The goal of decomposition is to replace a relation by several that do not exhibit anomalies. There is a simple condition under which the anomalies can be guaranteed not to exist. This condition is called Boyce-Codd Normal Form, or BCNF. A relation is in BCNF if: –Whenever there is a nontrivial dependency A 1 A 2 …A n  B 1 B 2 …B m for R, it must be the case that the left hand side {A 1, A 2, …, A n } is a superkey for R.

Boyce-Codd Normal Form - Example Relation Movie in the previous figure is not in BCNF. Consider the FD: title year  length filmType studioName Unfortunately, the left side of the above dependency is not a superkey. –In particular we know that the title and the year does not functionally determine starName. On the other hand, Movie 1 is in BCNF. –The only key is {title, year} and –title year  length filmType studioName is the only (non-trivial) FD that holds in the relation. Violating BCNF

Decomposition into BCNF The decomposition strategy is: –Find a non-trivial FD A 1 A 2 …A n  B 1 B 2 …B m that violates BCNF, i.e. A 1 A 2 …A n is not a superkey. –Decompose the relation schema into two overlapping relation schemas: One is all the attributes involved in the violating dependency and the other is the left side and all the other attributes not involved in the dependency. By repeatedly, choosing suitable decompositions, we can break any relation schema into a collection of smaller schemas in BCNF. The data in the original relation is represented faithfully by the data in the relations that are the result of the decomposition. –i.e. we can reconstruct the original relation exactly from the decomposed relations.

Boyce-Codd Normal Form - Example Consider relation schema: Movies(title, year, studioName, president, presAddr) and functional dependencies: title year  studioName studioName  president president  presAddr Last two FDs violate BCNF. Why? Compute {title, year}+, {studioName}+, {president}+ and see if you get all the attributes of the relation. If not, you got some FD which is violates BCNF, and need to break relation.

Boyce-Codd Normal Form – Example Let’s decompose starting with: studioName  president Let’s add to the right-hand side any other attributes in the closure of studioName (optional “rule of thumb”). 1.X={studioName} studioName  president 2.X={studioName, president} president  presAddr 3.X={studioName}+={studioName, president, presAddr}

Boyce-Codd Normal Form – Example From the closure we get: studioName  president presAddr We decompose the relation schema into the following two schemas: Movies1(studioName, president, presAddr) Movies2(title, year, studioName) Movies2 is in BCNF. Because we can’t find a “bad” FD holding there. What about Movies1? The following dependency violates BCNF. president  presAddr Why it’s bad to leave Movies1 table as is? If many studios share the same president than we would have redundancy when repeating the presAddr in all those studios.

Boyce-Codd Normal Form – Example We must decompose Movies1, using the FD: president  presAddr The resulting relation schemas, both in BCNF, are: Movies11(title, year, studioName) Movies12(studioName, president) In general, we must keep applying the decomposition rule as many times as needed, until all our relations are in BCNF. So, finally we got Movies11, Movies12, and Movies2.

Finding FDs for the decomposed relations When we decompose a relation, we need to check that the resulting schemas are in BCNF. We can’t tell a relation is in BCNF, unless we can determine the FDs that hold for that relation.

Suppose S is one of the resulting relations in a decomposition of R. For this: Consider each subset X of attributes of S. Compute X + using the FD on R. At the end throw out the attributes of R, which aren’t in S. Then, for each attribute B such that: B is an attribute of S, B is in X + we have that the functional dependency X  B holds in S. Finding FDs for the decomposed relations

Relational Algebra Operations Operations of relational algebra fall into four broad classes: 1.The usual set operations union intersection difference 2.Operations that remove parts of a relation: selection eliminates some rows(tuples) projection eliminates some columns 3.Operations that combine the tuples of two relations: Cartesian product pairs the tuples of two relations in all possible ways join selectively pairs tuples from two relations. 4.An operation called “renaming.”

Conditions for Set Operations on Relations 1.R and S must have schemas with identical sets of attributes. 2.Before applying the operations, the columns of R and S must be ordered so that the order of attributes is the same for both relations.

Projection Produces from a relation R a new relation that has only some of R’s columns.  A1, A2,…,An (R) is a relation that has only the columns for attributes A 1, A 2,…, A n of R. Example: Compute the expression  title, year, length (Movies) on the table: title yearlengthfilmType studioName producerC# Star wars color Fox Mighty Ducks color Disney Wayne’s World color Paramount 99999

Example (Continued) Resulting relation: titleyearlength Star wars Mighty Ducks Wayne’s World What about  filmtype (Movies)

Selection Selection, applied to a relation R, produces a new relation with a subset of R’s tuples. The tuples in the result are those that satisfy some condition C. Denote it with  C ( R ). The schema for the resulting relation is the same as R’s schema. Example: The expression  length  100 (Movie) is: title yearlengthfilmType studioName producerC# Star wars color Fox Mighty Ducks color Disney 67890

Cartesian Product Cartesian Product of two relations R and S is the set of pairs that can be formed by choosing the first element of the pair to be any element of R and the second an element of S. This denoted as R  S. Example: R: A B S: B C D R  S: AR.BS.BCD

Natural Join Denoted as R S. Let A 1, A 2,…,A n be the attributes in both the schema of R and the schema of S. Then a tuple r from R and a tuple s from S are successfully paired if and only if r and s agree on each of the attributes A 1, A 2, …, A n. Example: The natural join of the relation R and S from previous example is: ABCD

Combing Operations to Form Queries “What are the title and years of movies made by Fox that are at least 100 minutes long?” One way to compute the answer to this query is: Select those Movie tuples that have length  100. Select those Movie tuples that have studioName =‘Fox’. Compute the intersection of first and second steps. Project the relation from the third step onto attributes title and year.

Another Example Consider two relations Movie1 and Movie2, With schemas: Movie1(title, year, length, filmType, studioName) Movie2(title, year, starName) Suppose we want to know: “Find the stars of the movies that are at least 100 minutes long.” First we join the two relations: Movie1, Movie2 Second we select movies with length at least 100 min. Then we project the starName.

Relational Algebra on Bags A bag is like a set, but an element may appear more than once. Example: {1,2,1,3} is a bag. {1,2,3} is also a bag that happens to be a set. Bags also resemble lists, but order in a bag is unimportant. –Example: {1,2,1} = {1,1,2} as bags, but [1,2,1] != [1,1,2] as lists.

Operations on Bags Selection applies to each tuple, so its effect on bags is like its effect on sets. Projection also applies to each tuple, but as a bag operator, we do not eliminate duplicates. Products and joins are done on each pair of tuples, so duplicates in bags have no effect on how we operate.

Bag Union, Intersection, Difference An element appears in the union of two bags the sum of the number of times it appears in each bag. Example: {1,2,1}  {1,1,2,3,1} = {1,1,1,1,1,2,2,3} An element appears in the intersection of two bags the minimum of the number of times it appears in either. Example: {1,2,1}  {1,2,3} = {1,2}. An element appears in difference A – B of bags as many times as it appears in A, minus the number of times it appears in B. –But never less than 0 times. Example: {1,2,1} – {1,2,3} = {1}.

The Extended Algebra 1.  = eliminate duplicates from bags. 2.  = sort tuples. 3.Extended projection: arithmetic, duplication of columns. 4.  = grouping and aggregation. 5.OUTERJOIN: avoids “dangling tuples” = tuples that do not join with anything.

Example: Outerjoin R = ABS =BC (1,2) joins with (2,3), but the other two tuples are dangling. R S =ABC NULL NULL67

Aggregation Operators They apply to entire columns of a table and produce a single result. The most important examples: –SUM –AVG –COUNT –MIN –MAX

Example: Aggregation R =AB SUM(A) = 7 COUNT(A) = 3 MAX(B) = 4 MIN(B) = 2 AVG(B) = 3

Grouping Operator R 1 :=  L (R 2 ). L is a list of elements that are either: 1.Grouping attributes. 2.AGG(A), where AGG is one of the aggregation operators and A is an attribute. Semantics Group R according to all the grouping attributes on list L. That is, form one group for each distinct list of values for those attributes in R. Within each group, compute AGG(A) for each aggregation on list L. Result has grouping attributes and aggregations as attributes. One tuple for each list of values for the grouping attributes and their group’s aggregations.

Example: Grouping/Aggregation R =ABC  A,B,AVG(C) (R) = ?? First, group R : ABC Then, average C within groups: ABAVG(C)

Example: Grouping/Aggregation StarsIn(title, year, starName) We want, for each star who has appeared in at least three movies the earliest year in which he or she appeared. First we group, using starName as a grouping attribute. Then, we have to compute the MIN(year) for each group. However, we need also compute COUNT(title) aggregate for each group, in order to filter out those stars with less than three movies.  ctTitle>3 [  starName,MIN(year)  minYear,COUNT(title)  ctTitle (StarsIn)]

Aggregations in SQL SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause to produce that aggregation on the column. Find the average length of movies from Disney. SELECT AVG(length) FROM Movie WHERE studioName = ' Disney ' ;

Eliminating Duplicates in an Aggregation DISTINCT inside an aggregation causes duplicates to be eliminated before the aggregation. Example: Find the number of different producers for Disney movies: SELECT COUNT(DISTINCT producerc) FROM Movie WHERE studioname = 'Disney'; This is not the same as: SELECT DISTINCT COUNT(producerc) FROM Movie WHERE studioname = 'Disney';

NULL’s Ignored in Aggregation NULL never contributes to a sum, average, or count, and can never be the minimum or maximum of a column. select SUM(networth) from movieexec;

Example: Effect of NULL’s SELECT count(*) FROM Movie WHERE studioName = 'Disney'; SELECT count(length) FROM Movie WHERE studioName = 'Disney'; The number of movies from Disney. The number of movies from Disney with a known length.

Grouping We may follow a SELECT-FROM-WHERE expression by GROUP BY and a list of attributes. The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group. From Movie relation, find the average length for each studio : SELECT studioName, AVG(length) FROM Movie GROUP BY studioName;

Example: Grouping Find the producer’s total length of film produced. SELECT name, SUM(length) FROM Movie, MovieExec WHERE producerc = cert GROUP BY name; Compute those tuples first, then group by name.

Restriction on SELECT Lists With Aggregation If any aggregation is used, then each element of the SELECT list must be either: 1.Aggregated, or 2.An attribute on the GROUP BY list.

Illegal Query Example We might think we could find the shortest movie of Disney as: SELECT title, MIN(length) FROM Movie WHERE studioName = 'Disney'; But this query is illegal in SQL. Why? Because title is neither aggregated nor on the GROUP BY list. We should do instead: SELECT title, length FROM Movie WHERE studioName = 'Disney' AND length = (SELECT MIN(length) FROM Movie WHERE studioName = 'Disney');

HAVING Clauses HAVING may follow a GROUP BY clause. If so, the condition applies to each group, and groups not satisfying the condition are eliminated. These conditions may refer to attributes that make sense within a group; i.e., they are either: 1.Grouping attributes, or 2.Aggregated attributes.

Example: HAVING Suppose that we didn’t wish to include all the producers in our table of aggregated movie lengths. Suppose for instance we want those producers who have at least one movie before SELECT name, SUM(length) FROM MovieExec, Movie WHERE producerc = cert GROUP BY name HAVING MIN(year) < 1973;