# 6.830/6.814 Lecture 3 Sam Madden Relational Algebra and Normalization Sept 10, 2014.

## Presentation on theme: "6.830/6.814 Lecture 3 Sam Madden Relational Algebra and Normalization Sept 10, 2014."— Presentation transcript:

6.830/6.814 Lecture 3 Sam Madden Relational Algebra and Normalization Sept 10, 2014

Relational Algebra Projection π(R,c1, …, cn) = π c1…cn R select a subset c1 … cn of columns of R Selection σ(R, pred) = σ pred R select a subset of rows that satisfy pred Cross Product (||R|| = #attrs in R, |R| = #rows in row) R1 X R2 (aka Cartesian product) combine R1 and R2, producing a new relation with ||R1|| + ||R2|| attrs, |R1| * |R2| rows Join (R1, R2, pred) = R1 pred R2 = σ pred (R1 X R2)

Relational Algebra  SQL SELECT List  Projection FROM List  all tables referenced WHERE  SELECT and JOIN Many equivalent relational algebra expressions to any one SQL query (due to relational identities) Join reordering Select reordering Select pushdown

Example animals(name,age,species,cageno,keptby,feedtime) keepers(kid,name) Cages kept by Joe: π cageno (σ name=‘joe’ (animals keptby=kid keepers)) SELECT cageno FROM keepers,animals WHERE keptby=kid AND keeper.name = ‘joe’

Multiple Feedtimes animals:(name STRING,cageno INT,keptby INT,age INT,feedtime TIME) CREATE TABLE feedtimes(aname STRING, feedtime TIME); ALTER TABLE animals RENAME TO animals2; ALTER TABLE animals2 DROP COLUMN feedtime; CREATE VIEW animals AS SELECT name, cageno, keptby, age, (SELECT feedtime FROM feedtimes WHERE aname=name LIMIT 1) AS feedtime FROM animals2 Views enable logical data independence by emulating old schema in new schema

Study Break # 1 Schema: classes: (cid, c_name, c_rid, …) rooms: (rid, bldg, …) students: (sid, s_name, …) takes: (t_sid, t_cid)

Questions 1) What SQL query is this expression equivalent to: π bldg (rooms rid=c_rid (σ c_name=‘6.830’ classes)) 2) Write an equivalent relational algebra expression to: SELECT s_name FROM student,takes,classes WHERE t_sid=sid AND t_cid=cid AND c_name=‘6.830’ a) Are there other possible expressions? b) Do you think one would be more “efficient” to execute? Why?

Hobby Schema SSNNameAddressHobbyCost 123johnmain stdolls\$ 123johnmain stbugs\$ 345marylake sttennis\$\$ 456joefirst stdolls\$ “Wide” schema – has redundancy and anomalies in the presence of updates, inserts, and deletes Table key is Hobby, SSN Person Hobby SSN Address Name Cost n:n Entity Relationship Diagram

Boyce-Codd Normal Form (BCNF) A set of relations is in BCNF if: For every functional dependency X  Y, in a set of functional dependencies F over a relation R, X is a superkey key of R, (where superkey means that X contains a key of R )

BCNFify Start with one "universal relation” While some relation R is not in BCNF Find an FD F=X  Y that violates BCNF on R Split R into R1 = (X U Y), R2 = R – Y

BCNFify Example for Hobbies SchemaFDs (S,H,N,A,C)S,H  N,A,C S  N, A H  C S = SSN, H = Hobby, N = Name, A = Addr, C = Cost violates bcnf SchemaFDs (S, N,A)S  N, A SchemaFDs (S,H, C)H  C violates bcnf SchemaFDs (H, C)H  C SchemaFDs (S,H) Iter 1 Iter 2 key Iter 3

Study Break # 2 Patient database Want to represent patients at hospitals with doctors Patients have names, birthdates Doctors have names, specialties Hospitals have names, addresses One doctor can treat multiple patients, each patient has one doctor Each patient in one hospital, hospitals have many patients Doctors work for one hospital, hospitals have many doctors 1) Draw an ER diagram 2) What are the functional dependencies 3) What is the normalized schema? Is it redundancy free?