Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Management Systems 1 Raghu Ramakrishnan Database Design Review Aug 24, 2015 Instructor: Xintao Wu.

Similar presentations


Presentation on theme: "Database Management Systems 1 Raghu Ramakrishnan Database Design Review Aug 24, 2015 Instructor: Xintao Wu."— Presentation transcript:

1 Database Management Systems 1 Raghu Ramakrishnan Database Design Review Aug 24, 2015 Instructor: Xintao Wu

2 Database Management Systems 2 Raghu Ramakrishnan Outline v E-R design v Relational model design v SQL conceptual evaluation v Normalization v Query optimization

3 Database Management Systems 3 Raghu Ramakrishnan Overview of db design v Requirement analysis –Data to be stored –Applications to be built –Operations (most frequent) subject to performance requirement v Conceptual db design –Description of the data (including constraints) –By high level model such as ER v Logical db design –Choose dbms to implement –Convert conceptual db design into database schema v Schema refinement ( normalization) v Physical db design –Analyze the workload –Refine db design to meet performance criteria (focus on Indexing ) v Security design

4 Database Management Systems 4 Raghu Ramakrishnan E-R Diagram Design lot dname budget did since name Works_In Departments Employees ssn

5 Database Management Systems 5 Raghu Ramakrishnan From E-R to Relational Tables v In translating a relationship set to a relation, attributes of the relation must include: – Keys for each participating entity set (as foreign keys). u This set of attributes forms a superkey for the relation. – All descriptive attributes. CREATE TABLE Works_In( ssn CHAR (1), did INTEGER, since DATE, PRIMARY KEY (ssn, did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments)

6 Database Management Systems 6 Raghu Ramakrishnan Constraints v Key constraint v Does every department have a manager? – If so, this is a participation constraint : the participation of Departments in Manages is said to be total (vs. partial ). lot name dname budgetdid since name dname budgetdid since Manages since Departments Employees ssn Works_In

7 Database Management Systems 7 Raghu Ramakrishnan Enforcing Constraints in Scheme CREATE TABLE Dept_Mgr( did INTEGER, dname CHAR(20), budget REAL, ssn CHAR(11) NOT NULL, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE NO ACTION )

8 Database Management Systems 8 Raghu Ramakrishnan Outline v E-R design v Relational model design v SQL conceptual evaluation v Normalization v Query optimization

9 Database Management Systems 9 Raghu Ramakrishnan v relation-list A list of relation names (possibly with a range-variable after each name). v target-list A list of attributes of relations in relation-list v qualification Comparisons (Attr op const or Attr1 op Attr2, where op is one of ) combined using AND, OR and NOT. v DISTINCT is an optional keyword indicating that the answer should not contain duplicates. Default is that duplicates are not eliminated! Basic SQL Query SELECT [DISTINCT] target-list FROM relation-list WHERE qualification

10 Database Management Systems 10 Raghu Ramakrishnan Conceptual Evaluation Strategy v Semantics of an SQL query defined in terms of the following conceptual evaluation strategy: –Compute the cross-product of relation-list. –Discard resulting tuples if they fail qualifications. –Delete attributes that are not in target-list. –If DISTINCT is specified, eliminate duplicate rows. v This strategy is probably the least efficient way to compute a query! An optimizer will find more efficient strategies to compute the same answers.

11 Database Management Systems 11 Raghu Ramakrishnan Example Instances R1 S1 v We will use these instances of the Sailors and Reserves relations in our examples. v If the key for the Reserves relation contained only the attributes sid and bid, how would the semantics differ?

12 Database Management Systems 12 Raghu Ramakrishnan Example of Conceptual Evaluation SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103

13 Database Management Systems 13 Raghu Ramakrishnan Division in SQL SELECT S.sname FROM Sailors S WHERE NOT EXISTS (( SELECT B.bid FROM Boats B) EXCEPT ( SELECT R.bid FROM Reserves R WHERE R.sid=S.sid)) Find sailors who’ve reserved all boats.

14 Database Management Systems 14 Raghu Ramakrishnan Conceptual Evaluation Step v Cross product to get all rows v Select rows that satisfy the condition specified in the where clause. v Remove unnecessary fields. v From the remaining rows form groups according to the group- by clause. v Discard all groups that do not satisfy the condition in the having clause. v Apply aggregate function to each group. v Retrieve values for the columns and aggregations listed in the select clause.

15 Database Management Systems 15 Raghu Ramakrishnan Find the age of the youngest sailor with age 18, for each rating with at least 2 such sailors v Only S.rating and S.age are mentioned in the SELECT, GROUP BY or HAVING clauses; other attributes ` unnecessary ’. SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT (*) > 1 Answer relation

16 Database Management Systems 16 Raghu Ramakrishnan Outline v E-R design v Relational model design v SQL conceptual evaluation v Normalization v Query optimization

17 Database Management Systems 17 Raghu Ramakrishnan Design problems v Problems due to R W : – Update anomaly : Can we change W in just the 1st tuple of SNLRWH? – Insertion anomaly : What if we want to record rating- wage information if there is no employee with such rating? – Deletion anomaly : If we delete all employees with rating 5, we lose the information about the wage for rating 5!

18 Database Management Systems 18 Raghu Ramakrishnan Example (Contd.) v Do we have problems with the decomposed tables? – Update anomaly – Insertion anomaly – Deletion anomaly Hourly_Emps2 Wages

19 Database Management Systems 19 Raghu Ramakrishnan Functional Dependencies (FDs) v A functional dependency X Y holds over relation R if, for every allowable instance r of R: – t1 r, t2 r, ( t1 ) = ( t2 ) implies ( t1 ) = ( t2 ) –i.e., given two tuples in r, if the X values agree, then the Y values must also agree. (X and Y are sets of attributes.) v An FD is a statement about all allowable relations. –Must be identified based on semantics of enterprise. –Given some allowable instance r1 of R, we can check if it violates some FD f, but we cannot tell if f holds over R!

20 Database Management Systems 20 Raghu Ramakrishnan Example v A relation R has attributes (S, C, T, R, G) which denotes student, course, time, room, and grade respectively. From requirements, the following FDs hold. –SC G –ST R –C T –TR C

21 Database Management Systems 21 Raghu Ramakrishnan Attribute Closure v Computing the closure of a set of FDs can be expensive. (Size of closure is exponential in # attrs!) v Typically, we just want to check if a given FD X Y is in the closure of a set of FDs F. An efficient check: –Compute attribute closure of X (denoted ) wrt F: u Set of all attributes A such that X A is in u There is a linear time algorithm to compute this. –Check if Y is in

22 Database Management Systems 22 Raghu Ramakrishnan Attribute Closure v Algorithm –Closure = X; –Repeat until there is no change{ u If there is an FD in F such that U closure u Then set closure = closure V}

23 Database Management Systems 23 Raghu Ramakrishnan Boyce-Codd Normal Form (BCNF) v Reln R with FDs F is in BCNF if, for all X A in –A X (called a trivial FD), or –X contains a key for R. v In other words, R is in BCNF if the only non-trivial FDs that hold over R are key constraints. –No redundancy in R that can be predicted using FDs alone.

24 Database Management Systems 24 Raghu Ramakrishnan Third Normal Form (3NF) v Reln R with FDs F is in 3NF if, for all X A in –A X (called a trivial FD), or –X contains a key for R, or –A is part of some key for R. v Minimality of a key is crucial in third condition above! v If R is in BCNF, obviously in 3NF. v If R is in 3NF, some redundancy is possible. It is a compromise, used when BCNF not achievable (e.g., no “good’’ decomp, or performance considerations). – Lossless-join, dependency-preserving decomposition of R into a collection of 3NF relations always possible.

25 Database Management Systems 25 Raghu Ramakrishnan Decomposition of a Relation Scheme v Suppose that relation R contains attributes A1... An. A decomposition of R consists of replacing R by two or more relations such that: –Each new relation scheme contains a subset of the attributes of R (and no attributes that do not appear in R), and –Every attribute of R appears as an attribute of one of the new relations. v Intuitively, decomposing R means we will store instances of the relation schemes produced by the decomposition, instead of instances of R. v E.g., Can decompose SNLRWH into SNLRH and RW.

26 Database Management Systems 26 Raghu Ramakrishnan Problems with Decompositions v There are three potential problems to consider: ¶ Some queries become more expensive. u e.g., How much did sailor Joe earn? (salary = W*H) · Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the original relation! u Fortunately, not in the SNLRWH example. ¸ Checking some dependencies may require joining the instances of the decomposed relations. u Fortunately, not in the SNLRWH example. v Tradeoff : Must consider these issues vs. redundancy.

27 Database Management Systems 27 Raghu Ramakrishnan Lossless Join Decompositions v Decomposition of R into X and Y is lossless-join w.r.t. a set of FDs F if, for every instance r that satisfies F: – ( r ) ( r ) = r v Definition extended to decomposition into 3 or more relations in a straightforward way. v It is essential that all decompositions used to deal with redundancy be lossless! (Avoids Problem (2).)

28 Database Management Systems 28 Raghu Ramakrishnan More on Lossless Join v The decomposition of R into X and Y is lossless-join wrt F if and only if the closure of F contains: –X Y X, or –X Y Y v In particular, the decomposition of R into UV and R - V is lossless-join if U V is empty and U V holds over R. Not a Lossless Join

29 Database Management Systems 29 Raghu Ramakrishnan Decomposition into BCNF v Consider relation R with FDs F. If X Y violates BCNF and Y is single attribute, decompose R into R - Y and XY. –Repeated application of this idea will give us a collection of relations that are in BCNF; lossless join decomposition, and guaranteed to terminate. –e.g., CSJDPQV, key C, JP C, SD P, J S –To deal with SD P, decompose into SDP, CSJDQV. –To deal with J S, decompose CSJDQV into JS and CJDQV v In general, several dependencies may cause violation of BCNF. The order in which we ``deal with’’ them could lead to very different sets of relations!

30 Database Management Systems 30 Raghu Ramakrishnan Outline v E-R design v Relational model design v SQL conceptual evaluation v Normalization v Query optimization

31 Database Management Systems 31 Raghu Ramakrishnan Overview of Query Optimization v Plan : Tree of R.A. ops, with choice of alg for each op. – Each operator typically implemented using a `pull’ interface: when an operator is `pulled’ for the next output tuples, it `pulls’ on its inputs and computes them. v Two main issues: – For a given query, what plans are considered? u Algorithm to search plan space for cheapest (estimated) plan. – How is the cost of a plan estimated? v Ideally: Want to find best plan. Practically: Avoid worst plans! v We will study the System R approach.

32 Database Management Systems 32 Raghu Ramakrishnan Schema for Examples v Similar to old schema; rname added for variations. v Reserves: – Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. v Sailors: – Each tuple is 50 bytes long, 80 tuples per page, 500 pages. Sailors ( sid : integer, sname : string, rating : integer, age : real) Reserves ( sid : integer, bid : integer, day : dates, rname : string)

33 Database Management Systems 33 Raghu Ramakrishnan Motivating Example v Cost: 1000+500*1000 I/Os v By no means the worst plan! v Misses several opportunities: selections could have been `pushed’ earlier, no use is made of any available indexes, etc. v Goal of optimization: To find more efficient plans that compute the same answer. SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Reserves Sailors sid=sid bid=100 rating > 5 sname Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) RA Tree: Plan:

34 Database Management Systems 34 Raghu Ramakrishnan Alternative Plans 1 (No Indexes) v Main difference: push selects. v With 5 buffers, cost of plan: – Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution). – Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings). – Sort T1 (2*2*10), sort T2 (2*4*250), merge (10+250) – Total: 4060 page I/Os. v If we used BNL join, join cost = 10+4*250, total cost = 2770. v If we `push’ projections, T1 has only sid, T2 only sid and sname : – T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < 2000. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) (Sort-Merge Join)

35 Database Management Systems 35 Raghu Ramakrishnan Alternative Plans 2 With Indexes v With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples on 1000/100 = 10 pages. v INL with pipelining (outer is not materialized). v Decision not to push rating>5 before the join is based on availability of sid index on Sailors. v Cost: Selection of Reserves tuples (10 I/Os); for each, must get matching Sailors tuple (1000*1.2); total 1210 I/Os. v Join column sid is a key for Sailors. –At most one matching tuple, unclustered index on sid OK. –Projecting out unnecessary fields from outer doesn’t help. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) (On-the-fly)

36 Database Management Systems 36 Raghu Ramakrishnan Statistics and Catalogs v Need information about the relations and indexes involved. Catalogs typically contain at least: – # tuples (NTuples) and # pages (NPages) for each relation. – # distinct key values (NKeys) and NPages for each index. – Index height, low/high key values (Low/High) for each tree index. v Catalogs updated periodically. – Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok. v More detailed information (e.g., histograms of the values in some field) are sometimes stored.


Download ppt "Database Management Systems 1 Raghu Ramakrishnan Database Design Review Aug 24, 2015 Instructor: Xintao Wu."

Similar presentations


Ads by Google