Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.

Similar presentations


Presentation on theme: "1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003."— Presentation transcript:

1 1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003

2 2 Outline Design of a Relational schema (3.6) Relational Algebra (5.2) Operations on bags (5.3, 5.4) –Reading assignment 5.3 and 5.4 (won’t have time to cover in class)

3 3 Relational Schema Design (or Logical Design) Main idea: Start with some relational schema Find out its FD’s Use them to design a better relational schema

4 4 Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Updated anomalies: need to change in several places Delete anomalies: may lose data when we don’t want

5 5 Relational Schema Design Anomalies: Redundancy = repeat data Update anomalies = Fred moves to “Bellevue” Deletion anomalies = Joe deletes his phone number: what is his city ? Recall set attributes (persons with several phones): SSN  Name, City NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield but not SSN  PhoneNumber

6 6 Relation Decomposition Break the relation into two: NameSSNCity Fred123-45-6789Seattle Joe987-65-4321Westfield SSNPhoneNumber 123-45-6789206-555-1234 123-45-6789206-555-6543 987-65-4321908-555-2121 Anomalies have gone: No more repeated data Easy to move Fred to “Bellevue” (how ?) Easy to delete all Joe’s phone number (how ?) NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield

7 7 Relational Schema Design Person buys Product name pricenamessn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies

8 8 Decompositions in General R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p )

9 9 Decomposition Sometimes it is correct: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NamePrice Gizmo19.99 OneClick24.99 Gizmo19.99 NameCategory GizmoGadget OneClickCamera GizmoCamera Lossless decomposition

10 10 Incorrect Decomposition Sometimes it is not: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NameCategory GizmoGadget OneClickCamera GizmoCamera PriceCategory 19.99Gadget 24.99Camera 19.99Camera What’s incorrect ?? Lossy decomposition

11 11 Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) If A 1,..., A n  B 1,..., B m Then the decomposition is lossless R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) Example: name  price, hence the first decomposition is lossless Note: don’t need necessarily A 1,..., A n  C 1,..., C p

12 12 Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Third Normal Form (3NF) = this lecture Boyce Codd Normal Form (BCNF) = this lecture Others...

13 13 Boyce-Codd Normal Form A simple condition for removing anomalies from relations: In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R. A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R

14 14 BCNF Decomposition Algorithm A’s Others B’s R1R1 Is there a 2-attribute relation that is not in BCNF ? Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations R2R2

15 15 Example What are the dependencies? SSN  Name, City What are the keys? {SSN, PhoneNumber} Is it in BCNF? NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield Joe987-65-4321908-555-1234Westfield

16 16 Decompose it into BCNF NameSSNCity Fred123-45-6789Seattle Joe987-65-4321Westfield SSNPhoneNumber 123-45-6789206-555-1234 123-45-6789206-555-6543 987-65-4321908-555-2121 987-65-4321908-555-1234 SSN  Name, City Let’s check anomalies: Redundancy ? Update ? Delete ?

17 17 Summary of BCNF Decomposition Find a dependency that violates the BCNF condition: A’s Others B’s R1R2 Heuristics: choose B, B, … B “as large as possible” 12m Decompose: Is there a 2-attribute relation that is not in BCNF ? Continue until there are no BCNF violations left. A 1, A 2, …, A n  B 1, B 2, …, B m

18 18 Example Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Decompose in BCNF (in class): Step 1: find all keys (How ? Compute S +, for various sets S) Step 2: now decompose

19 19 Other Example R(A,B,C,D) A  B, B  C Key: AD Violations of BCNF: A  B, A  C, A  BC Pick A  BC: split into R1(A,BC) R2(A,D) What happens if we pick A  B first ?

20 20 Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C) R1(A,B) R2(A,C) R’(A,B,C) should be the same as R(A,B,C) R’ is in general larger than R. Must ensure R’ = R Decompose Recover

21 21 Lossless Decompositions Given R(A,B,C) s.t. A  B, the decomposition into R1(A,B), R2(A,C) is lossless

22 22 3NF: A Problem with BCNF Unit Company Product Unit Company Unit Product FD’s: Unit  Company; Company, Product  Unit So, there is a BCNF violation, and we decompose. Unit  Company No FDs Notice: we loose the FD: Company, Product  Unit

23 23 So What’s the Problem? Unit Company Product Unit CompanyUnit Product Galaga99 UW Galaga99 databases Bingo UW Bingo databases No problem so far. All local FD’s are satisfied. Let’s put all the data back into a single table again: Galaga99 UW databases Bingo UW databases Violates the dependency: company, product -> unit!

24 24 Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n  B for R, then {A 1, A 2,..., A n } a super-key for R, or B is part of a key. A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n  B for R, then {A 1, A 2,..., A n } a super-key for R, or B is part of a key. Tradeoff: BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies

25 25 Relational Algebra Formalism for creating new relations from existing ones Its place in the big picture: Declartive query language Algebra Implementation SQL, relational calculus Relational algebra Relational bag algebra

26 26 Relational Algebra Five operators: –Union:  –Difference: - –Selection:  –Projection:  –Cartesian Product:  Derived or auxiliary operators: –Intersection, complement –Joins (natural,equi-join, theta join, semi-join) –Renaming: 

27 27 1. Union and 2. Difference R1  R2 Example: –ActiveEmployees  RetiredEmployees R1 – R2 Example: –AllEmployees -- RetiredEmployees

28 28 What about Intersection ? It is a derived operator R1  R2 = R1 – (R1 – R2) Also expressed as a join (will see later) Example –UnionizedEmployees  RetiredEmployees

29 29 3. Selection Returns all tuples which satisfy a condition Notation:  c (R) Examples –  Salary > 40000 (Employee) –  name = “Smithh” (Employee) The condition c can be =,, , <>

30 30 Find all employees with salary more than $40,000.  Salary > 40000 (Employee)

31 31 4. Projection Eliminates columns, then removes duplicates Notation:  A1,…,An (R) Example: project social-security number and names: –  SSN, Name (Employee) –Output schema: Answer(SSN, Name)

32 32  SSN, Name (Employee)

33 33 5. Cartesian Product Each tuple in R1 with each tuple in R2 Notation: R1  R2 Example: –Employee  Dependents Very rare in practice; mainly used to express joins

34 34

35 35 Relational Algebra Five operators: –Union:  –Difference: - –Selection:  –Projection:  –Cartesian Product:  Derived or auxiliary operators: –Intersection, complement –Joins (natural,equi-join, theta join, semi-join) –Renaming: 

36 36 Renaming Changes the schema, not the instance Notation:  B1,…,Bn (R) Example: –  LastName, SocSocNo (Employee) –Output schema: Answer(LastName, SocSocNo)

37 37 Renaming Example Employee NameSSN John999999999 Tony777777777 LastNameSocSocNo John999999999 Tony777777777  LastName, SocSocNo (Employee)

38 38 Natural Join Notation: R1 ⋈ R2 Meaning: R1 ⋈ R2 =  A (  C (R1  R2)) Where: –The selection  C checks equality of all common attributes –The projection eliminates the duplicate common attributes

39 39 Natural Join Example Employee NameSSN John999999999 Tony777777777 Dependents SSNDname 999999999Emily 777777777Joe NameSSNDname John999999999Emily Tony777777777Joe Employee Dependents =  Name, SSN, Dname (  SSN=SSN2 (Employee x  SSN2, Dname (Dependents))

40 40 Natural Join R= S= R ⋈ S= AB XY XZ YZ ZV BC ZU VW ZV ABC XZU XZV YZU YZV ZVW

41 41 Natural Join Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? Given R(A, B, C), S(D, E), what is R ⋈ S ? Given R(A, B), S(A, B), what is R ⋈ S ?

42 42 Theta Join A join that involves a predicate R1 ⋈  R2 =   (R1  R2) Here  can be any condition

43 43 Eq-join A theta join where  is an equality R1 ⋈ A=B R2 =  A=B (R1  R2) Example: –Employee ⋈ SSN=SSN Dependents Most useful join in practice

44 44 Semijoin R ⋉ S =  A1,…,An (R ⋈ S) Where A 1, …, A n are the attributes in R Example: –Employee ⋉ Dependents

45 45 Semijoins in Distributed Databases Semijoins are used in distributed databases SSNName... SSNDnameAge... Employee Dependents network Employee ⋈ ssn=ssn (  age>71 (Dependents)) T =  SSN  age>71 (Dependents) R = Employee ⋉ T Answer = R ⋈ Dependents

46 46 Complex RA Expressions Person Purchase Person Product  name=fred  name=gizmo  pid  ssn seller-ssn=ssnpid=pidbuyer-ssn=ssn  name

47 47 Operations on Bags A bag = a set with repeated elements All operations need to be defined carefully on bags {a,b,b,c}  {a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f} {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}  C (R): preserve the number of occurrences  A (R): no duplicate elimination Cartesian product, join: no duplicate elimination Important ! Relational Engines work on bags, not sets ! Reading assignment: 5.3 – 5.4

48 48 Finally: RA has Limitations ! Cannot compute “transitive closure” Find all direct and indirect relatives of Fred Cannot express in RA !!! Need to write C program Name1Name2Relationship FredMaryFather MaryJoeCousin MaryBillSpouse NancyLouSister


Download ppt "1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003."

Similar presentations


Ads by Google