Presentation is loading. Please wait.

Presentation is loading. Please wait.

Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004.

Similar presentations


Presentation on theme: "Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004."— Presentation transcript:

1

2 Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004

3 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 2 Agenda Last time: E/R models, some design issues This time: More design “carving at the joints”  Redundancy  Whether an element should be an attribute or entity set  Replacing a relationships with entity sets Constraints  Identifying & specifying key attributes to an entity set  Recognizing other types of single-valued constraints  Representing referential integrity constraints  Identifying & representing general constraints Weak entity sets

4 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 3 Design Principles Faithfulness Avoiding redundancy Simplicity Choice of relationships Picking elements

5 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 4 Avoiding redundancy Say everything exactly once  Minimize database storage requirements  More important: prevent possible update errors simplest but not only e.g.: modify data one place but not the other – more later Example: Spot the redundancy StudiosMovies Own StudioName Name Length Name Address Redundancy: Movies “knows” the studio two ways Phone

6 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 5 Spot more redundancy Different redundancy: studio info listed for every movie! Movies StudioName Name Length SAddress SPhon e Name Length Studio SAddress SPhone Pulp Fiction… Miramax NYC 212-… Sylvia… Miramax NYC 212-… Jay & Sil. Bob … Miramax NYC 212-… …

7 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 6 Don’t add relships that are implied StudentsCourses TAs Enrolls TA-of Assist Suppose each course again has <=1 TA Q: Is the following good design? A: If TAs other than the course’s TA can help students, then yes; if not, then no: we can connect Students and TAs by going through Courses; redundant!

8 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 7 Correct E/R models may contain loops Person plays multiple roles:  employee of company  buyer of product price address namessn Person buys makes employs Company Product namecategory stockprice name

9 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 8 More design  Repeating TA names & IDs – redundant  TA is not TAing any course now  lose TA’s data!  TA should get its own ES StudentsCourses Enrolls Q: What’s wrong with this design? A: TA-NameTA-ID TA-Email Course-ID CName

10 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 9 Opposite problem: Entity or attribute? Some E/Rs improved by removing entities Can convert Entity E  attributes of F 1. R:F  E is many-one  one-one counts because special case 2. Attributes for E are independent of each other  knowing one att val doesn’t tell us another att val Then  remove E  add all attributes of E to F

11 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 10 StudentsCourses Enrolls TA-Name Assists TA Entity  attribute CName Room StudentsCourses Enrolls CName Room TA-Name Course-ID

12 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 11 Convert TA entity again? No! Multiple TAs allowed Violates condition (1) Redundant course data StudentsCourses Enrolls Assists TA CName CIDRoom TA-Name DBMS 46 123 Howard DBMS 46 123 Wesley … CName Room Course-ID TA-Name

13 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 12 Convert TA entity again? StudentsCourses Enrolls Assists TA CName Room Course-ID TA-IDTA-Favorite-Color No! TA has dependent fields Violates condition (2)  How can it tell? Redundant TA data CName TA-Name TA-ID TA-Color DBMS Ralph 678 Green A.Soft. Ralph 678 Green … TA-Name

14 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 13 Entity or attributes? Should student address be an entity or an attribute? If student may have multiple addresses, must be entity  campus address, permanent address  attributes cannot be set-valued If we need to examine structure of address, must be entity  find all students from NYS but not NYC If attribute, then it’s probably a simple string  no structure!  NB: this choice is a microcosm of entire miniworld  (much) power of a DB comes from the structure imposed on the data

15 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 14 Larger example DB design Application: library database. Authors have written books about various subjects; different libraries in the system may carry these books. Entities (with attributes in parentheses):  Authors (ssn, name, phone, birthdate)  Books (ISDN, title)  Subjects (sname, sid)  Libraries (lname) Relations [associating entities in square brackets]:  Wrote-on [Authors, Subjects]  Cover [Libraries, Subjects]  On [Books, Subjects]

16 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 15 E/R of DB design Name Author ssnphonebirthdate wrote-on Subject SName Title Carries Library LName On Book ISBN

17 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 16 Poor initial design First design is a poor model of this system Some info not captured:  How many copies does a lib. have of a given book?  What edition of a book does the library have? Design problems:  no direct relship associating authors and books  no direct relship associating libraries and books Common queries complex and difficult/expensive  What libraries carry books by a given author?  What books has a given author written?  Who is the author of a given book?

18 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 17 Larger example DB design 2 Application: library database as before Entities (with attributes in parentheses):  Authors (ssn, name, phone, birthdate)  Books (ISDN, title)  Subjects (sname, sid)  Libraries (lname) Relations [associating entities in square brackets] (attributes in parentheses):  Wrote [Authors, Books]  Carries [Libraries, Books] (quantity, edition)  On [Books, Subjects]

19 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 18 E/R of improved DB design Rule of thumb: often queried together  make closely connected Name Author ssnphonebirthdate wrote Book ISBN Title Carries Library LName Edition Quantity On Subject SName

20 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 19 Next topic: Constraints Review: programmer-defined rules stating what should always be true about consistent databases Restrictions on data:  Keys (e.g. SSNs uniquely identify people)  Single value constraints (e.g. everyone has 1 father)  Referential Integrity (e.g. person’s record refers to father  father must exist)  Domain constraints (e.g. gender in M/F, age in 0..150)  General constraints (e.g. no more than 10 customers per sales rep) Can’t infer constraints from data  may hold “accidentally”  they are a part of the schema

21 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 20 E/R keys Uniquely identifies entity in ES Attribute or set of attributes  Two entities cannot agree on all key attributes  These attributes determine all others Every ES should have a key  possibly including all attributes Primary key attributes underlined More than one possible key:  Candidate keys, primary key Practical tip: create intentional key attribute  E.g. SSN, course-id, employee-id, etc.  SSN likely shorter than (name,address)  Prevents quasi-redundancy address namessn Person

22 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 21 Single-valued constraints “at most one” value  sharp arrows E.g. attributes: could be null or one Many-one relationships: the “one” part is single-valued. Can think of key atts as (non-null) single- valued TACourse Assists

23 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 22 Referential integrity “Exactly one value” NOT NULL attributes Relationships  Non-null value refers to entity that exists  Refer to entity with foreign key  HTML analogy: no broken links  Programming analogy: no dangling pointers  Ways of handling deletion: Prevent deletion as long as referrer exist Enforce deletion of all referrers InstructorCourse Taught

24 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 23 Referential integrity – E/R e.g. Insertion – must refer to existing entity Suppose need to add  course: “Oracle”  instructor: MPJ Q: Which order? Q: What if relship were exactly-exactly?  i.e., referential integrity in both directions? A: Put both inserts in one xact – later StudentsCourses Enrolls Instructor Taught

25 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 24 Other kinds of constraints Domain constraints  E.g. date: must be after 1980  Enumerated type: grades A through F, no E  No specific E/R notation: mention with attribute or relationship General constraints:  A class may have no more than 100 students; a student may not have more than 6 courses: StudentsCourses Enroll <=6<=100

26 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 25 Next topic: Weak entity sets (2.4) Definition:  Some or all key attributes belong to another ES Why:  An entity set is part of a hierarchy (not ISA)  Connecting entity sets The key consists of  0, 1 or more of its own attributes  Key attributes of entity sets from supporting relationships

27 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 26 Conditions of Supporting relationships Supporting relationship R:E  F  R is many-one (E-F) or one-one  R is binary  Referential integrity from E to F i.e. a rounded arrow  Those atts supplied to E are the key attributes of F  F itself may be weak Another entity set G, and so on recursively  A1 A2 R E F

28 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 27 If several supporting relationships from E to F  Keys of several different entities from F appear as foreign key of E Other many-one relationships  Not necessarily supporting Requirements for weak entity sets From By Purchases A1 A2 A3 People Stores At-store  

29 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 28 Weak entity sets Example: Hierarchy – species & genus Idea: species name unique per genus only Species name Belongs-to Genus name 

30 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 29 Video store connecting entity sets e.g. was a weak entity set Key: date, MID,SID, CID Weak entity sets    MID SID CID Rental StoreOf MovieOf BuyerOf date Product Store Customer

31 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 30 E/R design summary Subject/design choices:  Should a concept be modeled as an ES or an att?  Should a concept be modeled as an ES or a relship?  Identifying relationships: binary or multiway? Constraints in the ER Model:  Important in determining the best design.  Much data semantics can (and should) be captured  Normalization improves further – later

32 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 31 Agenda Last time: finished E/R models per se This time: Intro to relational model Converting ER diagrams to relations Functional dependencies  Keys and superkeys in terms of FDs  Finding keys for relations  Rules of FDs Normalization

33 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 32 Next topic: the Relational Data Model Database Model (E/R, other) Relational Schema Physical storage Diagrams (E/R) Tables: column names: attributes rows: tuples Complex file organization and index structures.

34 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 33 Relations as tables Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi tuples/rows/records/entities Attribute names Product table/relation

35 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 34 Relational terminology Relation is composed of tuples Tuples composed of attribute values  Attribute has atomic types Relation schema: relation name + attribute names + attribute types Relation instance: set of tuples  order doesn’t matter Database schema: set of relation schemas Database instance: relation instance for every relation in the schema

36 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 35 Relations as sets Remember: math relation is a subset of the cross- product of the attribute value sets  R subset-of S x T  Product subset-of Name x Price x Cat x Mft One member of Product relation:  (gizmo, $19.99, gadgets, GizmoWorks) in Product DB Relation instance = math relation Q: If relations are sets, why call “instances”? A: R is a member of the powerset P(SxT)  powerset = set of all subsets

37 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 36 More on tuples Formally, can also be a mapping  from attribute names to (correctly typed) values: name  gizmo price  $19.99 category  gadgets manufacturer  GizmoWorks NB: ordered tuple is equiv to mapping  Both ways supported in SQL Sometimes we refer to a tuple by itself (note order of attributes)  (gizmo, $19.99, gadgets, GizmoWorks) or  Product(gizmo, $19.99, gadgets, GizmoWorks).

38 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 37 Updates/modifications The database maintains a current database state Modifications of data:  add a tuple  delete a tuple  update an attribute value in a tuple DB Relation instance = math relation Idea: we saw partic. Product DB instance  add, delete rows  different DB rel. instances  technically, different math relations  to DBMS, still the same relation/table Modifications to the data are frequent Updates to the schema are rare, painful (Why?)

39 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 38 E/R models to relations  Recall justification:  design is easier in E/R  implementation is easier/faster in R  Parallel to program compilation:  design is easier in C/Java/whatever  implemen. is easier/faster in machine/byte code  Strategy 1. apply semi-mechanical conversion rules 2. improve by combining some relations 3. improve by normalization involves finding functional dependencies

40 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 39 E/R conversion rules Entity set  … relation  attributes: attributes of entity set  key: key of ES Relationship  relation  attributes: keys of entity-sets/roles  key: depends on multiplicity NB: mapping of types is not one-one  We’ll see: mapping one tokens is not one-one Special treatment:  Weak entity sets  Isa relations & subclasses

41 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 40 Entity Sets Entity set Students ssn name address Students John Howard Name South Carolina444-555-6666 Park Avenue111-222-3333 AddressSSN Rel: Students

42 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 41 Entity Sets Course CourseID CourseName

43 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 42 Binary many-to-many relationships Key: keys of both entities Why we learned to recognize keys C30.0046444-555-6666 C20.0056111-222-3333 C20.0046111-222-3333 CourseIDssn Relation: Enrolls Enrolls S_addr S_Name Students Course Course-Name CourseID ssn

44 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 43 Many-to-one relationships Key: keys of many entity MoviesStudiosowns 2003SyliaM202 1999Mr. Ripley.M101 YearTitleMovieID Movies OrlandoDisneyS73 NYCMiramaxS35 AddressNameStudioID Studios S35 S73 StudioID CN22222 CN11111 CopyrightNo M202 M101 MovieID Owns CopyrightNo MovieID Title Year StudioID Name Address

45 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 44 Improving on many-one Note rules applied:  Movies Rel.: all atts from Movies ES  Studios Rel: all atts from Studios ES  Owns Rel: att key atts from Movies & Studios ESs But: Owns:Movies  Studios is many-one  for each row in Movies, there’s a(/no) row in Owns   just add the Owns data to Movies

46 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 45 Many-to-one: a better design Q: What if a movie’s Owns row were missing? 2003SyliaM202 1999Mr. Ripley.M101 YearTitleMovieID Movies S35 S73 StudioID CN22222 CN11111 CopyrightNo M202 M101 MovieID Owns CN22222 CN11111 CopyrightNo S35 S73 StudioID 2003 1999 Year SyliaM202 Talent Mr. Ripley M101 TitleMovieID Movies’

47 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 46 Many-to-many relationships again Won’t work for many-many relationships acts MovieIDTitleYear M101Mr. Ripley1999 M202Sylia2003 M303P.D. Love2002 StarIDNameAddress T400Gwyneth P.Bev.Hills T401P.S. HoffmanHollywood T402Jude LawPalm Springs MovieIDStarID M101T400 M202T400 M101T401 M101T402 M303T401 Movies Stars Acts Movies Stars

48 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 47 Many-to-many relationships again MovieIDTitleYearStarID M101Talented Mr. Ripley1999T400 M101Talented Mr. Ripley1999T401 M101Talented Mr. Ripley1999T402 M202Sylia2003T400 M303Punch Drunk Love2003T401 And here’s why:

49 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 48 Multiway relationships & roles Different roles treated as different entity sets Key: keys of the many entities StudentsCourses TAs tutorsgraders enrolls TA_SSN Name SSNCourseID Name

50 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 49 Multiway relationships & roles Enrolls(S_SSN, Course_ID, Tutor_SSN, Grader_SSN) SSNName 111-11-1111George 222-22-2222Dick TA_SSNName 333-33-3333Wesley 444-44-4444Howard 555-55-5555John StudentsTAs CourseIDName C20.0046Databases C20.0056Software Courses S_SSNCourseIDTutor_SSNGrader_SSN 111-11-1111C20.0046333-33-3333444-44-4444 222-22-2222C20.0046444-44-4444555-55-5555

51 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 50 Converting weak ESs – differences Atts of Crew Rel are:  attributes of Crew  key attributes of supporting ESs CrewUnit-ofStudio StudioName Crew_ID address C2Miramax C1Disney C1Miramax Crew_IDStudioName Crew Supporting relships are omitted (why?)

52 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 51 Weak entity sets - relationships CrewStudio StudioName Crew_ID address Insurance IName Address 1260 7 th Av.NYBlueCross 1250 6 th Av.NYAetna AddressIName Insurance Subscribes Unit-of

53 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 52 Weak entity sets - relationships Non-support relationships for weak ESs are converted  keys include entire weak ES key C21 C22 C21 Crew_ID Aetna BlueCross Aetna Insurer Universal Disney Universal StudioName Subscribes

54 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 53 Conversion example Video store rental example, plus some atts Q: Conversion to relations? Rental VideoStore Customer Movie date year MName address Cname MID

55 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 54 Conversion example, continued Resulting binary-relationship version Q: Conversion to relations? Rental Customer Store Movie StoreOf MovieOf BuyerOf date year MName address Cname MID

56 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 55 Converting inheritance hierarchies No best way Several non-ideal methods:  E/R-style: each ES  relation  OO-style: each possible “object”  relation  nulls-style: each rooted hierarchy  relation non-applicable fields filled in with nulls Pros & cons  for each method, exist situations favoring it

57 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 56 Converting inheritance hierarchies Movies Cartoons Murder- Mysteries isa Voices Weapon stars length titleyear Lion King Component

58 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 57 Inheritance: E/R-style conversion Each ES  relation Root entity set: Movies(title, year, length) 1301993Lion King 1988 1990 1980 Year 110 115 120 length Roger Rabbit Scream Star Wars Title Knife1990R. Rabbit 1988 Year Knife murderWeapon Scream Title Subclass: MurderMysteries(title, year, murderWeapon) Subclass: Cartoons(title, year) 1993Lion King 1990 Year Roger Rabbit Title

59 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 58 E/R-style & quasi-redundancy Name and year of Roger Rabbit were listed in three different rows (in different tables) Suppose title changes (“Roger”  “Roget”)   must change all three places Q: Is this redundancy? A: No!  name and year are independent  multiple movies may have same name Real redundancy reqs. dependency two rows agree on SSN  must agree on rest  conflicting hair colors in these rows is an error two rows agree on movie title  may still disagree  conflicting years may be correct – or may not be Better: introduce “movie-id” key att

60 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 59 Subclasses: object-oriented approach Every possible “subtree” (what’s this?): 1. Movies 2. Movies + Cartoons 3. Movies + Murder-Mysteries 4. Movies + Cartoons + Murder-Mysteries TitleYearlength Star Wars1980120 TitleYearlengthMurder-Weapon Scream1988110Knife TitleYearlength Lion King1990115 TitleYearlengthMurder-Weapon Roger Rabbit1988110Knife 1.3. 2.4.

61 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 60 Subclasses: nulls approach One relation for entire hierarchy Any non-applicable fields are NULL Q: How do we know if a movie is a MM? Q: How do we know if a movie is a cartoon? TitleYearlengthMurder-Weapon Star Wars1980120NULL Lion King1993130NULL Scream1988110Knife Roger Rabbit1990115Knife

62 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 61 Agenda Last time: relational model Homework 1 is up, due next Tuesday This time: 1. Functional dependencies  Keys and superkeys in terms of FDs  Finding keys for relations 2. Rules for combining FDs Next time: anomalies & normalization

63 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 62 Next topic: Functional dependencies (3.4) FDs are constraints  part of the schema  can’t tell from particular relation instances  FD may hold for some instances “accidentally” Finding all FDs is part of DB design  Used in normalization

64 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 63 Functional dependencies Definition: Notation: Read: A i functionally determines B j If two tuples agree on the attributes A 1, A 2, …, A n then they must also agree on the attributes B 1, B 2, …, B m A 1, A 2, …, A n  B 1, B 2, …, B m

65 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 64 Typical Examples of FDs Product  name  price, manufacturer Person  ssn  name, age  father’s/husband’s-name  last-name  zipcode  state  phone  state (notwithstanding inter-state area codes) Company  name  stockprice, president  symbol  name  name  symbol

66 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 65 To check A  B, erase all other columns; for each rows t 1, t 2 i.e., check if remaining relation is many-one  no “divergences”  i.e., if A  B is a well-defined function  thus, functional dependency Functional dependencies BmBm...B1B1 AmAm A1A1 t1t1 t2t2 if t 1, t 2 agree here then t 1, t 2 agree here

67 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 66 FDs Example Product(name, category, color, department, price) name  color category  department color, category  price name  color category  department color, category  price Consider these FDs: What do they say ?

68 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 67 FDs Example FDs are constraints: On some instances they hold On others they don’t namecategorycolordepartmentprice GizmoGadgetGreenToys49 TweakerGadgetGreenToys99 Does this instance satisfy all the FDs ? name  color category  department color, category  price name  color category  department color, category  price

69 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 68 FDs Example namecategorycolordepartmentprice GizmoGadgetGreenToys49 TweakerGadgetBlackToys99 GizmoStationaryGreen Office- supp. 59 What about this one ? name  color category  department color, category  price name  color category  department color, category  price

70 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 69 Q: Is Position  Phone an FD here? A: It is for this instance, but no, presumably not in general Others FDs? EmpID  Name, Phone, Position but Phone  Position Recognizing FDs EmpIDNamePhonePosition E0045Smith1234Clerk E1847John9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer

71 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 70 Keys of relations {A 1 A 2 A 3 …A n } is a key for relation R if 1. A 1 A 2 A 3 …A n functionally determine all other attributes Usual notation: A 1 A 2 A 3 …A n  B 1 B 2 …B k rels = sets  distinct rows can’t agree on all A i 2. A 1 A 2 A 3 …A n is minimal No proper subset of A 1 A 2 A 3 …A n functionally determines all other attributes of R Primary key: chosen if there are several possible keys

72 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 71 Keys example Relation: Student(Name, Address, DoB, Email, Credits) Which (/why) of the following are keys?  SSN  Name, Address (on reasonable assumptions)  Name, SSN  Email, SSN  Email NB: minimal != smallest

73 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 72 Superkeys A set of attributes that contains a key Satisfies first condition:  functionally determines every other attribute in the relation Might not satisfy the second condition: minimality  may be possible to peel away some attributes from the superkey  keys are superkeys key are special case of superkey  superkey set is superset of key set name;ssn is a superkey but not a key

74 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 73 Discovering keys for relations Relation  entity set  Key of relation = (minimized) key of entity set Relation  binary relationship  Many-many: union of keys of both entity sets  Many(M)-one(O): only key of M (Why?)  One-one: key of either entity set (but not both!)

75 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 74 Example – entity sets Key of entity set = (minimized) key of relation Student(Name, Address, DoB, SSN, Email, Credits) Student Name Address DoB SSN Email Credits

76 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 75 Example – many-many Many-many key: union of both ES keys StudentEnrolls Course SSNCredits CourseID Name Enrolls(SSN,CourseID)

77 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 76 Example – many-one Key of many ES but not of one ES  keys from both would be non-minimal CourseMeetsIn Room CourseIDName RoomNo Capacity MeetsIn(CourseID,RoomNo)

78 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 77 Example – one-one Keys of both ESs included in relation Key is key of either ES (but not both!) HusbandsMarriedWives SSNName SSN Name Married(HSSN, WSSN) or Married(HSSN, WSSN)

79 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 78 Discovering keys: multiway Multiway relationships:  Multiple ways – may not be obvious  R:F,G,H  E is many-one  E’s key is included but not part of key Recall that relship atts are implicitly many-one CourseEnrolls Student CourseID Name SSN NameSection RoomNo Capacity Enrolls(CourseID,SSN,RoomNo)

80 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 79 Combining FDs (3.5) If some FDs are satisfied, then others are satisfied too If all these FDs are true: name  color category  department color, category  price name  color category  department color, category  price Then this FD also holds: name, category  price Why?

81 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 80 Splitting & combining FDs Splitting rule: Combining rule: Note: doesn’t apply to the left side Q: Does it apply to the left side? A 1 A 2 …A n  B 1 B 2 …B m A 1, A 2, …, A n  B 1 A 1, A 2, …, A n  B 2..... A 1, A 2, …, A n  B m A 1, A 2, …, A n  B 1 A 1, A 2, …, A n  B 2..... A 1, A 2, …, A n  B m Bm...B1Am...A1 t1 t2

82 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 81 Reflexive rule: trivial FDs FD A 1 A 2 …A n  B 1 B 2 …B k may be  Trivial: Bs are a subset of As  Nontrivial: >=1 of the Bs is not among the As  Completely nontrivial: none of the Bs is among the As Trivial elimination rule:  Eliminate common attributes from Bs, to get an equivalent completely nontrivial FD A 1, A 2, …, A n  A i with i in 1..n is a trivial FD A1A1 …AnAn t t’

83 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 82 Transitive rule If and then A 1, A 2, …, A n  B 1, B 2, …, B m B 1, B 2, …, B m  C 1, C 2, …, C p A 1, A 2, …, A n  C 1, C 2, …, C p A1A1 …AmAm B1B1 …BmBm C1C1...CpCp t t’

84 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 83 Example R(A,B,C) Each of three determines other two Q: What are the FDs?  Closure of singleton sets  Closure of doubletons Q: What are the keys? Q: What are the minimal bases?

85 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 84 Examples of Keys Product(name, price, category, color) name, category  price category  color Keys are: {name, category} Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time Keys are: [in class]

86 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 85 Agenda Last time: FDs Project part 2 up soon This time: 1. Anomalies 2. Normalization Next time: Relational Algebra

87 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 86 Review examples: finding FDs Product(name, price, category, color) name, category  price category  color Keys are: {name, category} Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time Keys are: [in class]

88 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 87 Another review example Relation R(A,B,C) Each of three attributes determines other two Q: What are the FDs?  Closure of singleton sets  Closure of doubletons Q: What are the keys? Q: What are the minimal bases?

89 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 88 Next topic: Anomalies (3.6) Identify anomalies in existing schema How to decompose a relation Boyce-Codd Normal Form (BCNF) Recovering information from a decomposition Third Normal Form

90 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 89 Types of anomalies Redundancy  Repeat info unnecessarily in several tuples Update anomalies:  Change info in one tuple but not in another Deletion anomalies:  Delete some values & lose other values too Insert anomalies:  Inserting row means NULL-ing some fields

91 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 90 Example of anomalies Redundancy: name, address Update anomaly: Bill moves Delete anom.: Bill doesn’t pay bills, lose phones  lose Bill! Underlying cause: SSN-phone is many-many Effect: partial dependency ssn  name, address NameSSNMailing-addressPhone Michael123NY212-111-1111 Michael123NY917-111-1111 Hilary456DC202-222-2222 Hilary456DC914-222-2222 Bill789Chappaqua914-222-2222 Bill789Chappaqua212-333-3333 SSN  Name, Mailing-address SSN  Phone

92 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 91 Decomposition by projection Soln: replace anomalous R with projections of R onto two subsets of attributes Projection: an operation in Relational Algebra  projection = SELECT in SQL Projecting R onto attributes (A 1,…,A n ) means removing all other attributes  Result of projection is another relation  Yields tuples whose fields are A 1,…,A n  Resulting duplicates ignored

93 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 92 Projection for decomposition R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p A 1,..., A n  B 1,..., B m  C 1,..., C p = all attributes, usually disjoint sets R 1 and R 2 may (/not) be reassembled to produce original R. R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p )

94 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 93 Chappaqua789Bill NY123Hilary NY123Michael Mailing-addressSSNName Decomposition example The anomalies are gone  No more redundant data  Easy to for Bill to move  Okay for Bill to lose all phones Break the relation into two: NameSSNMailing-addressPhone Michael123NY212-111-1111 Michael123NY917-111-1111 Hilary456DC202-222-2222 Hilary456DC914-222-2222 Bill789Chappaqua914-222-2222 Bill789Chappaqua212-333-3333 789 914-222-2222 789 914-222-2222 567 202-222-2222 456 917-111-1111 123 212-111-1111 123 PhoneSSN

95 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 94 Thus: high-level strategy Person buys Product name pricenamessn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies

96 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 95 Using FDs to produce good schemas Start with set of relations Define FDs (and keys) for them based on real world Transform your relations to “normal form” (normalize them)  Do this using “decomposition” Intuitively, good design means  No anomalies  Can reconstruct all original information

97 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 96 Decomposition terminology Projection: eliminating certain attributes from relation Decomposition: separating a relation into two by projection Join: (re)assembling two relations  Whenever a row from R 1 and a row from R 2 have the same value for att A join to form a row of R 3 If the original data can be reproduced after a decomposition by joining the relations, then the decomposition was lossless  We join on the attributes R 1 and R 2 have in common (As) If it can’t, the decomposition was lossy

98 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 97 A decomposition is lossless if we can recover: R(A,B,C) R1(B,C) R2(B,A) R’(A,B,C) should be the same as R(A,B,C) R’ is in general larger than R. Must ensure R’ = R Decompose Recover Lossless Decompositions

99 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 98 Lossless decomposition Sometimes the data can be reproduced: (MSOffice, 100) + (MSOffice, WP)  (MSOffice, 100, WP) (MSOffice, 100) + (MSOffice, DB)  (MSOffice, 100, DB) (Oracle, 1000) + (Oracle, DB)  (Oracle, 1000, DB) NamePriceCategory MSOffice100WP Oracle1000DB MSOffice100DB NamePrice MSOffice100 Oracle1000 MSOffice100 NameCategory MSOfficeWP OracleDB MSOfficeDB

100 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 99 Lossy decomposition Sometimes it’s not: (MSOffice, WP) + (100, WP)  (MSOffice, 100, WP) (Oracle, DB) + (1000, DB)  (Oracle, 1000, DB) (Oracle, DB) + (100, DB)  (Oracle, 100, DB) (MSOffice, DB) + (1000, DB)  (MSOffice, 1000, DB) (MSOffice, DB) + (100, DB)  (MSOffice, 100, DB) NamePriceCategory MSOffice100WP Oracle1000DB MSOffice100DB NameCategory MSOfficeWP OracleDB MSOfficeDB PriceCategory 100WP 1000DB 100DB What’s wrong?

101 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 100 Ensuring lossless decomposition Examples: name  price, so first decomposition was lossless name  category, so second decomposition was lossy R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) If A 1,..., A n  B 1,..., B m Then the decomposition is lossless R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) Note: don’t necessarily need A 1,..., A n  C 1,..., C p

102 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 101 Quick lossless/lossy example At a glance: can we decompose into R 1 (Y,X), R 2 (Y,Z)? At a glance: can we decompose into R 1 (X,Y), R 2 (X,Z)? XYZ 123 425

103 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 102 Next topic: Normal Forms First Normal Form = all attributes are atomic  As opposed to set-valued  Assumed all along Second Normal Form (2NF) Third Normal Form (3NF) Boyce Codd Normal Form (BCNF) Fourth Normal Form (4NF)

104 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 103 Most important: BCNF A simple condition for removing anomalies from relations: I.e.: The left side must always contain a key I.e: If a set of attributes determines other attributes, it must determine all the attributes A relation R is in BCNF if: If As  Bs is a non-trivial dependency in R, then As is a superkey for R A relation R is in BCNF if: If As  Bs is a non-trivial dependency in R, then As is a superkey for R Codd: Ted Codd, IBM researcher, inventor of relational model, 1970 Boyce: Ray Boyce, IBM researcher, helped develop SQL in the 1970s

105 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 104 BCNF decomposition algorithm Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations A’s Others B’s R1R1 R2R2 //Heuristic: choose Bs as large as possible

106 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 105 Boyce-Codd Normal Form Name/phone example is not BCNF:  {ssn,phone} is key  FD: ssn  name,mailing-address holds Violates BCNF: ssn is not a superkey Its decomposition is BCNF  Only superkeys  anything else NameSSNMailing-addressPhone Michael123NY212-111-1111 Michael123NY917-111-1111 NameSSNMailing-address Michael123NY SSNPhoneNumber 123 212-111-1111 123 917-111-1111

107 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 106 BCNF Decomposition Larger example: multiple decompositions {Title, Year, Studio, President, Pres-Address} FDs:  Title Year  Studio  Studio  President  President  Pres-Address   Studio  President, Pres-Address (why?) No many-many this time Problem cause: transitive FDs:  Title,year  studio  president

108 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 107 BCNF Decomposition Illegal: As  Bs, where As don’t include key Decompose: Studio  President, Pres-Address  As = {studio}  Bs = {president, pres-address}  Cs = {title, year} Result: 1. Studios(studio, president, pres-address) 2. Movies(studio, title, year) Is (2) in BCNF? Is in (1) BCNF?  Key: Studio  FD: President  Pres-Address  Q: Does president  studio? If so, president is a key  But if not, it violates BCNF

109 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 108 BCNF Decomposition Studios(studio, president, pres-address) Illegal: As  Bs, where As don’t include key  Decompose: President  Pres-Address  As = {president}  Bs = {pres-address}  Cs = {studio} {Studio, President, Pres-Address} becomes  {President, Pres-Address}  {Studio, President}

110 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 109 BCNF and two-att relations Must a two-attribute relation be in BCNF?  Case 1: there are no non-trivial FDs  Case 2: A  B but not B  A  Case 3: B  A but not A  B  Case 4: Both A  B and B  A Note that relations may have multiple keys BCNF requires a key on the left, not all keys

111 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 110 Agenda Last time: Normalization Homework 1 due now Project part 2 is up, due on the 19 th (Thurs.) This time: 1. Finish BCNF 2. 3NF 3. 4NF

112 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 111 BCNF Review Q: What’s required for BCNF? Q: What’s the slogan for BCNF? Q: Who are B & C? Q: What are the two types of violations?

113 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 112 BCNF Review Q: How do we fix a non-BCNF relation? Q: If As  Bs violates BCNF, what do we do?  Q: In this case, could the decomposition be lossy? Q: Under what circumstances could a decomposition be lossy? Q: How do we combine two relations?

114 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 113 Decomposition algorithm example R(N,O,R,P) F = {N  O, O  R, R  N} Key: N,P Violations of BCNF: N  O, O  R, N  OR  which kinds of violations are these? Pick N  OR(on board) Can we rejoin?(on board) What happens if we pick N  O instead? Can we rejoin? (on board) NameOfficeResidencePhone GeorgePres.WH202-… GeorgePres.WH486-… DickVPNO202-… DickVPNO307-…

115 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 114 Lossless BCNF decomposition Consider simple relation: R(A,B,C) Only FD: A  B (assume C!  A) Key: A,C  Diff vars from text! Also goes through if assumption is false BCNF violation (which kind?): no key on the left Thus: Decomposition to BCNF: Create R1(A,B) and R2(A,C) Could this be lossy? We will join R1 and R2 on A to find out Q: If C  A, then what kind do we have? Q: Since C !  A, what kind of bad FD do we have?

116 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 115 Lossless BCNF decomposition Suppose R contains (b,a,c) and (b’,a,c’) In projection onto (B,A):  (b,a,c)  (b,a), (b’,a,c’)  (b’,a) In projection onto (A,C):  (b,a,c)  (a,c), (b’,a,c’)  (a,c’) In joining, (b’,a), (a,c)  (b’,a,c) Q: Is/must/can this be correct? A: Yes! A  B, so b = b’ So this was lossless We assumed C!  A, but argument also goes through when C  A Moral: BCNF decomp alg is always lossless

117 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 116 BCNF summary BCNF decomposition is lossless  Can reproduce original by joining Saw last time: Every 2-attribute relation is in BCNF Final set of decomposed relations might be different depending on  Order of bad FDs chosen Saw last time: But all results will be in BCNF

118 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 117 A problem with BCNF Relation: R(Title, Theater, Neighboorhood) FDs:  Title,N’hood  Theater Assume movie can’t play twice in same neighborhood  Theater  N’hood Keys:  {Title, N’hood}  {Theater, Title} TitleTheaterN’hood City of GodAngelicaVillage Fog of WarAngelicaVillage

119 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 118 A problem with BCNF BCNF violation: Theater  N’hood Decompose:  {Theater, N’Hood}  {Theater, Title} Resulting relations: VillageAngelica N’hoodTheater R1 Fog of WarAngelica City of GodAngelica TitleTheater R2

120 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 119 Problem - continued Suppose we add new rows to R1 and R2: Their join: City of GodVillageFilm Forum Village N’hood Fog of War City of God Title Angelica Theater (R’) TheaterN’hood AngelicaVillage Film ForumVillage TheaterTitle AngelicaCity of God AngelicaFog of War Film ForumCity of God R1R2 A and B could not enforce FD Title,N’hood  Theater

121 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 120 Third normal form: motivation There are some situations in which  BCNF is not dependency-preserving, and  Efficient checking for FD violation on updates is important In these cases BCNF is too severe a req. Solution: define a weaker normal form, called Third Normal Form  in which FDs can be checked on individual relations without performing a join (no inter-relational FDs)  to which relations can be converted, preserving both data and FDs

122 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 121 Third Normal Form BCNF decomposition is not dependency-preserving! We now define the (weaker) Third Normal Form  Turns out: this example was already in 3NF A relation R is in 3rd normal form if : For every nontrivial dependency A 1, A 2,..., A n  B for R, {A 1, A 2,..., A n } is a super-key for R, or B is part of a key, i.e., B is prime A relation R is in 3rd normal form if : For every nontrivial dependency A 1, A 2,..., A n  B for R, {A 1, A 2,..., A n } is a super-key for R, or B is part of a key, i.e., B is prime Tradeoff: BCNF = no FD anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies

123 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 122 BCNF: vices and virtues Be clear on the problem just described v. the arg. that BCNF decomp is lossless BCNF decomp does not lose data  Resulting relations can be rejoined to obtain the original But: it can can lose dependencies  After decomp, possible to add rows whose corresponding rows would be illegal in (rejoined) original

124 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 123 Recap: goals of normalization When we decompose a relation R with FDs F into R1..Rn we want: 1. lossless-join decomposition – no data lost 2. no/little redundancy: the relations Ri should be in either BCNF or at least 3NF 3. Dependency preservation: if Fi be the set of dependencies in F + that include only attributes in Ri:  F is the “sum” of the FDs of the new relations  (F 1  F 2  F 3  …  F n ) + = F +  Otherwise checking updates for violation of FDs may require computing joins, which is expensive

125 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 124 Dependency preservation Saw that last req. didn’t hold in move-theater example Did it hold in R(N,O,R,P) example? (on board)

126 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 125 Testing for 3NF For each dependency X  Y, use attribute closure to check if X is a superkey If X is not a superkey, verify that each attribute in Y is prime  This test is rather more expensive, since it involves finding candidate keys  Testing for 3NF is NP-complete (in what?)  Interestingly, decomposition into 3NF can be done in polynomial time   Testing for 3NF is harder than decomposing into 3NF! Optimization: need to check only FDs in F, need not check all FDs in F + (why?)

127 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 126 3NF Example R = (J, K, L) F = (JK  L, L  K) Two candidate keys: JK and JL R is in 3NF  JK  LJK is a superkey  L  KK is prime BCNF decomposition yields R1 = (L,K), R2 = (L,J)  testing for JK  L requires a join There is some redundancy in R

128 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 127 BCNF and 3NF Comparison Example of problems due to redundancy in 3NF  R = (J, K, L)  F = (JK  L, L  K) A schema that is in 3NF but not BCNF has the problems of:  redundancy (e.g., the relationship between l1 and k1)  need to use null values (if allowed!), e.g. to represent the relationship between l2 and k2 when there is no corresponding value for attribute J JKL j1k1l1 j2k1l1 j3k1l1 NULLk2l2

129 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 128 Comparison of BCNF and 3NF It is always possible to decompose a relation into relations in 3NF such that:  the decomposition is lossless  the dependencies are preserved It is always possible to decompose a relation into relations in BCNF such that:  the decomposition is lossless  but it may not be possible to preserve dependencies  But may eliminate more redundancy

130 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 129 The Normal Forms (so far) 1NF: every attribute has an atomic value 2NF: 1NF and no partial dependencies 3NF: for each FD X  Y either it is trivial, or X is a superkey, or Y is a part of some key BCNF: 3NF and third 3NF option disallowed I.e, 2NF and no transitive dependencies

131 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 130 Distinguishing examples 1NF but not 2NF: R(Name, SSN,Mailing- address,Phone)  Key: SSN,Phone  Partial: ssn  name, address 2NF but not 3NF: R(Title,Year,Studio,Pres,Pres-Addr)  Key: Title,Year  Transitive: studio  president 3NF but not BCNF: R( Title, Theater, N’hood)  Title,N’hood  Theater  Prime-on-right: Theater  N’hood

132 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 131 Design Goals Goal for a relational database design is:  No redundancy  Lossless Join  Dependency Preservation If we cannot achieve this, we accept one of  dependency loss  use of more expensive inter-relational methods to preserve dependencies  data redundancy due to use of 3NF Interesting: SQL does not provide a direct way of specifying FDs other than superkeys  can specify FDs using assertions, but they are expensive to test

133 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 132 3NF 3NF means we may have anomalies Example: TEACH(student, teacher, subject)  student, subject  teacher (students not allowed in the same subject with two teachers)  teacher  subject (each teacher teaches one subject)  Subject is prime, so this is 3NF But we have anomalies:  Insertion: cannot insert a teacher until we have a student taking his subject If we convert to BCNF, we lost student, subject  teacher

134 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 133 BCNF and over-normalization What is the problem? Schema overload – trying to capture two meanings:  1) subject X can be taught by teacher Y  2) student Z takes subject W from teacher V What to do? 3NF has anomalies, normalizing to BCNF loses FDs One soln: keep the 3NF TEACH and another (BCNF) relation SUBJECT-TAUGHT (teacher, subject) Still (more!) redundancy, but no more insert and delete anomalies

135 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 134 Normalization Review Q: What’s required for BCNF? Q: What are the two types of violations? Q: What’s the loophole for 3NF? Q: How do we fix a non-BCNF relation?

136 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 135 Normalization Review Q: If As  Bs violates BCNF, what do we do?  Q: In this case, could the decomposition be lossy? Q: How do we combine two relations? Q: Can BCNF decomp. lose FDs? Q: Can 3NF decomp. lose FDs?

137 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 136 Redundancy in BCNF Lots of redundancy! Key? All fields  None determined by others! Non-trivial FDs? None!  In BCNF? Yes! NameStreetsCitysJobs Michael111 East 60 th StreetNew YorkMayor Michael222 Brompton RoadLondonMayor Michael111 East 60 th StreetNew YorkCEO Michael222 Brompton RoadLondonCEO Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonSenator Hilary333 Some StreetChappaquaFirst Lady Hilary444 Embassy RowWashingtonFirst Lady Hilary333 Some StreetChappaquaLawyer Hilary444 Embassy RowWashingtonLawyer Now what? New concept, leading to another normal form: Multivalued dependencies

138 Matthew P. Johnson, OCL1, CISDD CUNY, F2004 137 E/R to relational model courses Depts Computer- allocation room number givenBy name chair


Download ppt "Matthew P. Johnson, OCL1, CISDD CUNY, F20041 ORCL1 Oracle 8i: SQL & PL/SQL Session #2 Matthew P. Johnson CISDD, CUNY Fall, 2004."

Similar presentations


Ads by Google