Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science, HKUST 1 Comp 231 Database Management Systems Comp 231 Database Management Systems 3. Relational Data Model.

Similar presentations


Presentation on theme: "Department of Computer Science, HKUST 1 Comp 231 Database Management Systems Comp 231 Database Management Systems 3. Relational Data Model."— Presentation transcript:

1

2 Department of Computer Science, HKUST 1 Comp 231 Database Management Systems Comp 231 Database Management Systems 3. Relational Data Model

3 Department of Computer Science, HKUST 2 Basic Concepts Entities and relationships are stored in tables Relationships are captured by including key of one table into another Languages for manipulating the tables All popular DBMSs today are based on relational data model (or an extension of it, e.g., object-relational data model)

4 Department of Computer Science, HKUST 3 Why is it so good? Simplicity, everybody knows how to manipulate tables Tables are simple enough so that solutions to complicated problems such as concurrency control and query optimization can be obtained It has a theoretical basis for the studying of database design problems Tables are logical concepts; physically tables can be stored in different ways  support data independence

5 Department of Computer Science, HKUST 4 Terminology Relation  table; denoted by R(A 1, A 2,..., A n ) Attribute  column; denoted by A i Tuple  row Attribute value  value stored in a table cell Domain  legal type and range of values of an attribute denoted by dom(A i ) –Attribute: AgeDomain: [0-100] –Attribute: EmpNameDomain: 50 alphabetic chars –Attribute: SalaryDomain: non-negative integer Ideally, a domain can be defined in terms of another domain; e.g., the domain of EmpName is PersonName. However, this is NOT allowed in most basic DBMSs.

6 Department of Computer Science, HKUST 5 Attributes/Columns Tuples/Rows Relation/Table Name An Example Relation

7 Department of Computer Science, HKUST 6 Some Formal Definitions A relation is denoted by: R(A 1, A 2,..., A n ) –STUDENT(Name, Student-id, Age, CGA) Degree of a relation: the number of attributes n in the relation. Tuple t of R(A 1, A 2,..., A n ): An ordered set of values where each v i is an element of dom(A i ). Relation instance r(R): A set of tuples in R r(R) = {t 1, t 2,..., t m }, or alternatively r(R)  dom(A1)  dom(A2) ...  dom(An)

8 Department of Computer Science, HKUST 7 Relation and Cartesian Product A relation is the Cartesian product of sets of values Name = { Lee, Cheung } Grade = { A, B, C } A relation StudentGrade (Name, Grade) can be defined as any subset of the following maximal extent: Name  Grade ={  Lee, A ,  Lee, B ,  Lee, C ,  Cheung, A ,  Cheung, B ,  Cheung C  } r(StudentGrade)  Name  Grade = {  Lee, A   Cheung C  }

9 Department of Computer Science, HKUST 8 After all, why don’t we call them tables? Observe which of the following would you consider a table: ??? A empty table/relation is still a table/relation (can be denoted by R = {  }) A null table/relation is still a null table/relation (can be denoted by R =  )

10 Department of Computer Science, HKUST 9 Characteristics of Relations Tuples in a relation are not considered to be ordered, even though they appear to be in a tabular form. Ordering of attributes in a relation schema R are significant because the mathematical definition of a relation doesn’t include attribute names. Values in a tuple: All values are considered atomic. A special null value is used to represent values that are unknown or inapplicable to certain tuples.

11 Department of Computer Science, HKUST 10 Let K  R (I.e., K is a set of attributes which is a subset of R) K is a superkey of R if K can identify a unique tuple of each possible relation r(R). Keys Customer(CusNo, Name, Address, …) where customers have unique customer numbers and unique names. superkeys: CusNo Name {CusNo, Name} {CusNo, Name, Address} plus many others K is a candidate key if K is minimal There are two candidate keys: CusNo and Name Every relation is guarantee to (must) have at least one key. Why?

12 Department of Computer Science, HKUST 11 Relational Algebra Relational Algebra

13 Department of Computer Science, HKUST 12 It is a procedural language, I.e., sequence of operations is explicitly specified in the language Six basic operations: select, project, union, set difference, Cartesian product, rename E.g., R = S  T  U Each operation takes one or more relations as input and give a relation as output (a property known as closure allowing us to nest relational algebra expressions) Contrast relational algebra expressions with arithmetic expressions; the former operates on relations and the latter on numbers Relational Algebra

14 Department of Computer Science, HKUST 13 Notation:  p (r) Defined as:  p (r) = { t | t  r  P(t) } Read as: t is a tuple of r and t satisfies the predicate P. P is a formula of the form: attribute name op attribute name orattribute name op constant where op is one of       Predicates can be connected by  (and)  (or)  (not) Select Operation

15 Department of Computer Science, HKUST 14 Relation r: Select Operation - Example  A  B  D  5 (r) :

16 Department of Computer Science, HKUST 15 Notation:  A 1, A 2,..., A k (r) where A 1, A 2,..., A n are attributes defined in r After the projection, duplicates are assumed to be eliminated automatically since a relation is a set! Not true in real DMBSs: duplicates are not eliminated until explicitly told to do so Order of attributes in the projection list can be arbitrary. (One use of projection is to rearrange attributes) Project Operation

17 Department of Computer Science, HKUST 16 Project Operation - Example Relation r:  A,C (r):

18 Department of Computer Science, HKUST 17 Union Operation Notation: r  s Defined as: r  s = { t | t  r  t  s } For r  s to be valid (r and s must be compatible), I.e., –r, s must have the same number of attributes –domains must be compatible pair-wise between r and s. R(EmpName, Age)S(StudName, Age) T(ProdID, Name)U(StudName, Age, CGA) Compatible relations: R and S, R and  StudName, Age U, etc. Incompatible relations: R and T, R and U, etc.

19 Department of Computer Science, HKUST 18 Union Operation - Example Relation r, s: r s r  s : AB  1  3  1  2 AB  1  1  2 AB  3  2

20 Department of Computer Science, HKUST 19 Set Difference Operation Notation: r  s Defined as: r  s = { t | t  r  t  s } For r  s to be valid: –r, s must have the same number of attributes –domains must be compatible pair-wise between r and s.

21 Department of Computer Science, HKUST 20 Set Difference Operation - Example Relation r, s: r s r  s : AB  2  3 AB  1  1 AB  1  1  2

22 Department of Computer Science, HKUST 21 Cartesian-Product Operation Notation: r  s Defined as: r  s = { t q | t  r  q  s } Assume that attributes of r and s are disjoint (I.e., r and s don’t have attributes with the same name) If they are not disjoint, then make them disjoint by renaming the attributes concerned

23 Department of Computer Science, HKUST 22 Cartesian-Product Operation - Example Relation r, s: r s r  s :

24 Department of Computer Science, HKUST 23 Cartesian product is rarely used by its own; it is the basis of the join operation, which is more popular r s =  A  C (r  s) A and C are sets of attributes which have the same attribute names in r and s. Join Operation

25 Department of Computer Science, HKUST 24 R = ( A, B, C, D ) S = ( E, B, D ) R joins S has the result schema: T(A, B, C, D, E) and is defined by: r s =  r.A,r.B,r.C,r.D,s.E (  r.B=s.B  r.D=s.D (r  s)) This is also called natural join Join Operation - Example B DB D

26 Department of Computer Science, HKUST 25 Natural Join Operation - Example rs Relation r, s: r s : ABCD   γ   1 1 4 2 2 γ  a a b a b  γ  BDE 1 2 1 3 3 γ  a b a a b    ABCDE      1 1 1 1 2     a a a a b γ γ γ γ 

27 Department of Computer Science, HKUST 26 Assignment Operation The assignment operation(  ) provides a convenient way to express complex queries by storing intermediate results into temporary relations Assignment must always be made to a temporary relation variable. Example: Write r  s as temp1   R-S (r ) temp2   R-S ((temp1  s) -  R-S,S ( r)) result = temp1 - temp2 –The result to the right of the  is assigned to the relation variable on the left of the . –May use variable in subsequent expressions.

28 Department of Computer Science, HKUST 27 Example Queries Find all customers who have an account from both the “Downtown” and “Uptown” branches. depositor (customer-name, account-no) account (account-no, branch-name, branch-city) Query 1  CN (  BN=“Downtown” (depositor account))   CN (  BN=“Uptown” (depositor account)) where CN denotes customer-name and BN denotes branch- name. Query 2 (for the purpose of illustrating division only) temp  {“Downtown”, “Uptown”} [  customer-name, branch-name (depositor account) ]  temp

29 Department of Computer Science, HKUST 28 all branches in Brooklyn Returns all branches of a customer given a customer x and the set of branches a, b, c, check  x,a   x,b  and  x,c  are in …  customer-name, branch-name (depositor account)   branch-name (  branch-city = “Brooklyn” (branch)) Example Queries Dividend contains all customer-branch information Divisor contains all branches located in Brooklyn Find all customers who have an account at all branches located in Brooklyn. depositor (customer-name, account-no) account (account-no, branch-name, branch-city)

30 Department of Computer Science, HKUST 29 Tuple Relational Calculus

31 Department of Computer Science, HKUST 30 Tuple Relational Calculus A nonprocedural query language, where each query is of the form { t | P(t) } It is the set of all tuples t such that predicate P is true for t t is a tuple variable; t[A] denotes the value of tuple t on attribute A t  r denotes that tuple t is in relation r P is a formula similar to that of the predicate calculus

32 Department of Computer Science, HKUST 31 Predicate Calculus Formula 1. Set of attributes and constants 2. Set of comparison operators: (e.g.,,  ) 3. Set of connectives: and (  ), or (  ), not (  ) 4. Implication (  ): x  y, if x is true, then y is true and if x is NOT true, then y could be either true OR false x  y   x  y 5. Set of quantifiers:  t  r (Q(t))  “there exists” a tuple t in relation r such that predicate Q(t) is true  t  r (Q(t))  Q is true “for all” tuples t in relation r

33 Department of Computer Science, HKUST 32 Banking example branch(branch-name, branch-city, assets) customer (customer-name, customer-street, customer-city) account (branch-name, account-number, balance) loan (branch-name, loan-number, amount) depositor (customer-name, account-number) borrower (customer-name, loan-number)

34 Department of Computer Science, HKUST 33 t A Simple Selection Query Find the branch-name, loan-number, and amount for loans of over $1200 {t | t  loan  t[amount]  1200} loan table amount t[amount] all relevant attributes are contained in: loan (branch-name, loan-number, amount) all attributes in tuple t is returned; {t[name] | … …} to get only the name attribute Intuitively, you can consider t as a variable (or pointer) which iteratively loops through each tuple in loan t is called a tuple variable

35 Department of Computer Science, HKUST 34 s A Join Query Find the loan number for each loan of an amount greater than $1200 and return the customer info {t | t  customer  (  s  loan) ( t[loan-number] = s[loan- number]  s[amount]  1200 ) } loan amt customer loan-no t t[loan-no] loan-no s[loan-no] s[amount] It is a join operation A customer may have many loans, but as long as one of his loans exceeds 1200, his information will be output again, all attributes in tuple t is returned

36 Department of Computer Science, HKUST 35 Peculiarity from the Textbook borrower (customer-name, loan-number) loan (branch-name, loan-number, amount) {t |  s  loan (t[loan-number] = s[loan-number]  s[amount]  1200)} Same query as the previous one, except that t is not defined as a tuple in borrower, so t is an arbitrary tuple variable The whole expression only mentions “loan-number” as an attribute of t, so we can only assume that t has only one attribute, namely, “loan-number” This query only returns loan numbers Let’s make it simpler (same as the first example!) {s[loan-number] | s  loan  s[amount]  1200} It is better to define the scope of a tuple variable first! You need the t  borrower part if you want to return the customer-street or city anyway!

37 Department of Computer Science, HKUST 36 Example Queries t’s name can be found in the borrower table or t’s name can be found in the depositor table t has NOT been explicitly defined as a tuple in the customer relation only attribute customer-name of tuple t is returned consider adding t  customer Question: can I use s  depositor instead of u  depositor ? Find the names of all customers having a loan, an account, or both at the bank { t | (  s  borrower) ( t[customer-name] =s[customer-name] )  (  u  depositor) (t[customer-name] = u[customer-name]) }

38 Department of Computer Science, HKUST 37 Example Queries Find the names of all customers who have a loan and an account at the bank. {t |  s  borrower (t[customer-name] =s[customer-name])   u  depositor (t[customer-name] = u[customer-name])} Alternative: {t [customer-name] | t  customer  (  s  borrower (t[customer-name] = s[customer-name])   u  depositor (t[customer-name] = u[customer-name]) )}

39 Department of Computer Science, HKUST 38 t s Perryridge u Example Queries customer borrower loan { t |  s  borrower (t[customer-name] = s[customer-name]   u  loan (u[branch-name] = “Perryridge”  u[loan-name] = s[loan-number]))}[textbook] Find the names of all customers having a loan at the Perryridge branch { t[name] | t  customer   s  borrower ( t[customer-name] = s[customer-name]   u  loan (u[branch-name] = “Perryridge”  u[loan-number] = s[loan-number] ) )}

40 Department of Computer Science, HKUST 39 Example Queries The inner  u  loan () cannot be taken out of the  s  borrower ( ) quantifier. {t | t  customer   s  borrower ( t[customer-name] = s[customer-name] )   u  loan (u[branch-name] = “Perryridge”  u[loan-number] = s[loan-number] )} t customer borrower loans u Perryridge s {t | t  customer   s  borrower ( t[customer-name] = s[customer-name] )   u  loan  s  borrower (u[branch-name] = “Perryridge”  u[loan-number] = s[loan-number] )} The two “s” in the two unnested existential quantifiers are not the same. where is “s” defined?

41 Department of Computer Science, HKUST 40 NOT-EXIST Example but t is not a depositor anywhere in the bank Find the names of all customers who have a loan at the Perryridge branch, but no account at any branch of the bank t is a borrower at Perryridge branch {t |  s  borrower ( t[customer-name] = s[customer-name]   u  loan ( u[branch-name] = “Perryridge”  u[loan-number] = s[loan-number] )   v  depositor (v[customer-name] = t[customer-name] ) ) } t customer borrower s depositor v Perryridge loan u

42 Department of Computer Science, HKUST 41 EXIST vs. NOT-EXIST  v  depositor (v[customer-name] = t[customer-name] ) Consider a customer t, there does not exist any depositor whose name is the same as t’s name.  v  depositor (v[customer-name]  t[customer-name] ) Consider a customer t, there exist (any) one depositor whose name is not the same as t’s name. t customer c b a depositor v c First query returns a and b Second query returns a, b, and c (c is qualified because you can find e in depositor which is  c) First query is the same as:  v  depositor v[cust-nm]  t[cust-nm] e

43 Department of Computer Science, HKUST 42 Example Queries t is a borrower the borrower has a loan and the loan is from Perryridge this part is for finding t’s city; not needed if you use t  customer Find the names of all customers having a loan from the Perryridge branch, and also return the cities they live in {t |  s  loan (s[branch-name] =“Perryridge”   u  borrower (u[loan-number] = s[loan-number]  t[customer-name] = u[customer-name]   v  customer (u[customer-name] =v[customer-name]  t[customer-city] = v[customer-city])))}

44 Department of Computer Science, HKUST 43 t lives in the same city as the Perryridge branch is located t has borrowed the loan some loan issued from Perryridge Example Queries s[branch-name] is already bound to “Perryridge”; want to find the city where Perryridge is in Find the names of all customers having a loan from the Perryridge branch and live in the the city where Perryridge branch is located. { t | t  customer   s  loan ( s[branch-name] =“Perryridge”   u  borrower ( u[loan-number] = s[loan-number]  t[customer-name] = u[customer-name]   v  branch ( v[branch-name] = s[branch-name]  t[customer-city] = v[branch-city] ) ) ) } branch(branch-name, branch-city, assets)

45 Department of Computer Science, HKUST 44 Query with Universal Quantifier if s is a branch in Brooklyn then there is an account u in from that branch such that t is the depositor of account u Find the names of all customers who have an account at all branches located in Brooklyn: {t |  s  branch ( s[branch-city] =“Brooklyn”   u  account ( s[branch-name] = u[branch-name]   s  depositor ( t[customer-name] = s[customer-name]  s[account-number] = u[account-number]) ) )}

46 Department of Computer Science, HKUST 45 Other Relational Algebra Operators

47 Department of Computer Science, HKUST 46 Outer Join An extension of the join operation that avoids loss of information. Computes the join and then adds tuples from one relation that do not match tuples in the other relation to the result of the join. Uses null values: –null signifies that the value is unknown or does not exist. –All comparisons involving null are false by definition.

48 Department of Computer Science, HKUST 47 Outer Join - Example Relation loan Relation borrower branch-nameloan-numberamount Downtown Redwood Perryridge L-170 L-260 L-230 3000 1700 4000 cust-nameloan-number Jones Hayes Smith L-170 L-230 L-155

49 Department of Computer Science, HKUST 48 Outer Join - Example loan branch-nameloan-numberamount Downtown Redwood Perryridge L-170 L-260 L-230 3000 1700 4000 borrower cust-nameloan-number Jones Hayes Smith L-170 L-230 L-155 branch-nameloan-numberamount Downtown Redwood L-170 L-230 3000 4000 cust-name Jones Smith Loan Borrower Join returns only the matching (or “good”) tuples The fact that loan L-260 has no borrower is not explicit in the result Hayes has borrowed an non-existent loan L-155 is also undetected

50 Department of Computer Science, HKUST 49 Left Outer Join -Example Left outer join: Loan borrower Keep the entire left relation (Loan) and fill in information from the right relation, use null if information is missing. branch-nameloan-numberamount Downtown Redwood Perryridge L-170 L-260 L-230 3000 1700 4000 cust-name Jones Smith null

51 Department of Computer Science, HKUST 50 Right and Full Outer Join - example Loan Borrower Loan borrower branch-name Downtown null Redwood amount 3000 4000 null cust-nameloan-number Jones Hayes Smith L-170 L-230 L-155 branch-name Downtown Perryridge Redwood amount 3000 4000 1700 cust-nameloan-number Jones null Smith L-170 L-230 L-260 null Hayes L-155 null

52 Department of Computer Science, HKUST 51 Aggregate functions Aggregation operator  takes a collection of values and returns a single value as a result. avg: average value min: minimum value max: maximum value sum: sum of values count: number of values G 1,G 2, … G n  F 1 A 1, F 2 A 2,…, F m A m ( E ) – E is any relational-algebra expression –G 1, G 2, …,G n is a list of attributes on which to group –F i is an aggregate function (avg, min, max, etc.) –A i is an attribute name to which F i applies

53 Department of Computer Science, HKUST 52 Aggregate function - example Relation r: sum c ( r )     A     B 7 7 3 10 C

54 Department of Computer Science, HKUST 53 Aggregate function - example Relation account grouped by branch-name: Branch-name  sum balance (account)

55 Department of Computer Science, HKUST 54 Division Operation is for Reference Only (not covered in class)

56 Department of Computer Science, HKUST 55 Division Operation Notation: r  s Suited to queries that include the phrase “for all.” Let r and s be relations on schemas R and S respectively, where R = (A 1, ….., A m, B 1, ….., B n ) S = (B 1, …., B n ) The result of r  s is a relation with attributes R - S = (A 1, …, A m ) r  s = { t | t  R-S ( r )   u  s ( tu  r ) }

57 Department of Computer Science, HKUST 56 rs r  s The result consists of attribute A only but not all of the 5 values. How to find out? u = 1, 2 Check if:  u  s ( tu  r ) Division Operation - example r  s = { t | t  R-S ( r )   u  s ( tu  r ) } B 1 2 Is ,1  and ,2  tuples in r? Is ,1  and ,2  tuples in r? Is ,1  and ,2  tuples in r? check γ and δ …     γ δ δ δ δ ε ε A 1 2 3 1 1 1 3 4 6 1 2 B A  ε t  R-S ( r ) A   δ ε γ

58 Department of Computer Science, HKUST 57 Relations r, s: Another Division Example r  s ABC α γ A A γ γ

59 Department of Computer Science, HKUST 58 Relation r, s: rs q Properties of Division Operation Let q = r  s Then q is the largest relation satisfying: q  s  r

60 Department of Computer Science, HKUST 59 Alternative Definition of Division Division is not a fundamental relational algebra operation since it can be defined with other basic algebra operations Let r(R) and s(S) be relations, and let S  R r  s =  R-S (r )   R-S ((  R-S (r )  s)   R-S,S (r )) To see why: –  R-S,S (r ) simply reorders attributes of r so that non-overlapping attributes appear on the left of those in s –  R-S (r ) contains all candidates for r  s (need to find the unqualified values) –  R-S (r)  s gives all possible combinations of “r” and s –(  R-S (r )  s)   R-S,S (r ) returns combinations which are not in r –  R-S ((  R-S (r )  s)   R-S,S (r )) gives those tuples t in  R-S (r) such that for some tuple u  s, tu  r.


Download ppt "Department of Computer Science, HKUST 1 Comp 231 Database Management Systems Comp 231 Database Management Systems 3. Relational Data Model."

Similar presentations


Ads by Google