Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in.

Similar presentations


Presentation on theme: "Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in."— Presentation transcript:

1 Database Systems 236363 Relational Calculus

2 Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in a programming language, yet the translation of a query from a natural language (or even SQL) to relational algebra is not intuitive In the Domain Relational Calculus (DRC) language and its variant Tuple Relational Calculus (TRC), the terms of the query are expressed as first-order logic expression, which is more similar to the way humans compose questions, in contrast to the “algorithmic” structure of relational algebra

3 Domain Relational Calculus (DRC)

4 DRC – Basic Concept DRC enables writing logical expressions describing conditions on the records Each such expression will have free variables (defined later); the truth/false value of the expression will be evaluated with respect to these variables The expression  (x 1,...,x n ) describes a condition  whose free variables are x 1,...,x n –  is also called a formula The free variables of a formula describe the attributes of the records – For any set of possible values  1,...,  n for the free variables x 1,...,x n, the expression  returns a truth value Either  (  1,...,  n ) is satisfied or  (  1,...,  n ) is not satisfied – R[A 1,...,A n ] = { :  (x 1,...,x n )} stands for the relation consisting of exactly all n-tuples satisfying  (x 1,...,x n )

5 Atomic Expressions Belongs to formulas – If R[A 1,...,A n ] is a relation in the database, then the formula R(x 1,...,x n ) is a formula with n free variables For any sequence of values  1,...,  n, the formula R(  1,...,  n ) is satisfied iff the relation includes a record with attributes corresponding to these values – We can also write  R instead of R(  1,...,  n ) Comparisons – Between variables and between variables and constants using the usual comparison operators such as “z=2”, “x<y”, etc. – The free variables are the ones appearing in the formula

6 Composing Expressions – Boolean Negation – If  (x 1,...,x n ) is a valid expression, then  (x 1,...,x n )=  (x 1,...,x n ) is also a valid expression with the same free variables – Meaning:  (x 1,...,x n ) is satisfied iff  (x 1,...,x n ) is not satisfied Disjunction – If  1 (x 1,...,x n ) and  2 (x 1,...,x n ) are valid expressions, then  (x 1,...,x n )=  1 (x 1,...,x n )  2 (x 1,...,x n ) is a valid expression that is satisfied if either  1 or  2 is satisfied (or both) – It is possible that only some of the variables x 1,...,x n in either  1 or  2 are free variables in this expression It is possible to “declare” for a given expression the free variables that do not appear in it explicitly

7 Additional Boolean Operations Conjunction – The expression  1 (x 1,...,x n )  2 (x 1,...,x n ) is satisfied for a given sequence of values x 1,...,x n iff it is satisfied for both  1 (x 1,...,x n ) and  2 (x 1,...,x n ) – It is equivalent to  ((  1 (x 1,...,x n ))  (  2 (x 1,...,x n ))) The parenthesis describe the order in which operators are applied in the composed expression when it is not obvious Implication – Written as  1 (x 1,...,x n )  2 (x 1,...,x n ) – This expression is equivalent to ((  1 (x 1,...,x n ))  (  2 (x 1,...,x n ))) Equivalence – Written as  1 (x 1,...,x n )  2 (x 1,...,x n ) – This expression is equivalent to (  1 (x 1,...,x n )  2 (x 1,...,x n ))  (  2 (x 1,...,x n )  1 (x 1,...,x n ))

8 The Existential Quantifier If  (x 1,...,x n ) is a valid expression, then  (x 2,...,x n )=  x 1  (x 1,...,x n ) is a valid expression whose free variables are x 2,...,x n – The meaning of this expression is that  (  2,...,  n ) is satisfied iff there exists a value  1 for which  (  1,...,  n ) is satisfied Example: – For attributes whose value ranges are the natural numbers,  (x 2,x 3 )=  x 1 ((x 2 > x 1 )  (x 1 > x 3 ) is satisfied iff x 2,x 3 are a pair of decreasing non-consecutive numbers

9 The Universal Quantifier Given a valid expression  (x 1,...,x n ), the expression  (x 2,...,x n )=  x 1  (x 1,...,x n ) has the following meaning: for any given sequence of values  2,...,  n,  (  2,...,  n ) is satisfied iff for every possible value  1,  (  1,...,  n ) is satisfied How do we define “every possible value”? – Naturally, if the set of possible values is infinite, verifying such an expressions becomes problematic… – For now, we assume that this is restricted to the range of possible values for the corresponding attribute – If the range is not clear, we need to state it explicitly – A similar problem also exists for  due to negation. In particular,  x 1  (x 1,...,x n ) is equivalent to  x 1  (x 1,...,x n )

10 Quantified and Free Variables Consider the formula  (y,z)=(  x  1 (x,y))  (  x  2 (x,z)) – The free variables here are y and Z – What about x? x appears twice, once on each side of the  operator. Yet, there is no relation between these occurances In fact, it is equivalent to the formula  (y,z)=(  x 1  1 (x 1,y))  (  x 2  2 (x 2,z)) – Denote R 1 [A,B] the relation defined by  1 and R 2 [A,C] the relation defined by  2 The equivalent expression in RA is (  B R 1 )  (  C R 2 ) – To enforce the same value on both sides of the  operator, the formula should be written  (y,z)=  x(  1 (x,y)  2 (x,z)) This is equivalent in RA to  B,C ( R 1 ⋈ R 2 )

11 Implementing RA Expressions in DRC Let  (x 1,...,x n ) be a formula representing T[A 1,...,A n ] – The relation  A 1,...,A m T[A 1,...,A n ] (for m :  x m+1,...,x n  (x 1,...,x m,x m+1,...,x n )} – To obtain   T[A 1,...,A n ] for a given  expression, denote  (x 1,...,x n ) the expression obtained from  by replacing each A i with X i (for each 1  i  n) We get the following first order logic expression   T={ :  (x 1,...,x n )   (x 1,...,x n )}

12 Cartesian Product in DRC Let  1 (x 1,...,x n ) be a formula representing T 1 [A 1,...,A n ] and  2 (x 1,...,x m ) be a formula representing T 2 [B 1,...,B m ] – We can represent T 1 xT 2 by the following expression T 1  T 2 = { :  1 (x 1,...,x n )  2 (x n+1,...,x n+m )} Notice that we need to use different sets of variables for  1 and  2

13 Subtract, Union, Intersection Let  1(x 1,...,x n ) be a formula representing T 1 [A 1,...,A n ] and  2(x 1,...,x n ) be a formula representing T 2 [A 1,...,A n ] We have: – T 1 \ T 2 = { :  1 (x 1,...,x n )   2 (x 1,...,x n )} – T 1  T 2 = { :  1 (x 1,...,x n )   2 (x 1,...,x n )} – T 1  T 2 = { :  1 (x 1,...,x n )   2 (x 1,...,x n )}

14 Composed Expressions in DRC Let  1 (x 1,...,x n,y 1,...,y m ) be a formula representing T 1 [A 1,...,A n,B 1,...,B m ] and  2 (y 1,...,y m,z 1,...,z k ) a formula representing T 2 [B 1,...,B m,C 1,...,C c ] – The natural join of T 1 and T 2 can be expressed as follows T 1 ⋈ T 2 = { :  1 (x 1,...,x n,y 1,...,y m )   2 (y 1,...,y m,z 1,...,z k )} – The semi-join can be expressed as T 1 ⋉ T 2 = { :  z 1,...,z k (  1 (x 1,...,x n,y 1,...,y m )   2 (y 1,...,y m,z 1,...,z k ))}

15 Formal Equivalence Proof In order to exemplify how equivalence proofs are structured, we will formally prove that if T 1 corresponds to  1 (x 1,...,x n,y 1,...,y m ) and T 2 corresponds to  2 (y 1,...,y m,z 1,...,z k ) then for the expression  (x 1,...,x n,y 1,...,y m,z 1,...,z k ) =  1 (x 1,...,x n,y 1,...,y m )  2 (y 1,...,y m,z 1,...,z k ) the following holds: T 1 ⋈ T 2 ={ :  (x 1,...,z k )} To that end, we need to show that for every two relations T 1 and T 2 and every sequence of values t=(  1,...,  n,  1,...,  m,  1,...,  k ): – If t  T 1 ⋈ T 2 then  (t) is satisfied – If  (t) is satisfied then t  T 1 ⋈ T 2

16 First Direction If t=(  1,...,  n,  1,...,  m,  1,...,  k )  T 1 ⋈ T 2 then by the definition of join, (  1,...,  n,  1,...,  m )  T 1 meaning that  1 (  1,...,  n,  1,...,  m ) is satisfied and (  1,...,  m,  1,...,  k )  T 2 meaning that  2 (  1,...,  m,  1,...,  k ) is also satisfied Hence,  1 (  1,...,  n,  1,...,  m )  2 (  1,...,  m,  1,...,  k ) =  (  1,...,  n,  1,...,  m,  1,...,  k ) is satisfied, as needed

17 Second Direction If  (t)=  1 (  1,...,  n,  1,...,  m )  2 (  1,...,  m,  1,...,  k ) is satisfied, then by the definition of conjunction,  1 (  1,...,  n,  1,...,  m ) is satisfied, meaning that (  1,...,  n,  1,...,  m )  T 1 Similarly,  2 (  1,...,  m,  1,...,  k ) is satisfied, meaning that (  1,...,  m,  1,...,  k )  T 2 Hence, by the definition of natural join T 1 ⋈ T 2, we have (  1,...,  n,  1,...,  m,  1,...,  k )  T 1 ⋈ T 2, as needed

18 Boolean Queries If  is a DRC expression without any free variables (but uses the database relations), then the query “{<>:  }” is still meaningful The result will be a relation without any attributes – It is the empty relation if  is not satisfied over the database – Otherwise, it is the relation that includes a single empty record We sometimes use the notation “  ” for such a query For example, the following query checks whether the relations R and S satisfy R ⊆ S: {<>:  x 1,...,x n (R(x 1,...,x n ) → S(x 1,...,x n ))}

19 Problematic Expressions Consider the DRC expression: { :  R(x)} – Can it be written in RA? – Can it be implemented in a table? How about the expression: { : x=y} ? We should avoid extremely large (and in particular infinite) tables as well as queries (and sub-queries) that require addressing extremely large (and in particular infinite) domains – In other words, we would like to restrict ourselves to the content of the DB and avoid being dependent on the domains

20 Safe DRC Safe-DRC is intended to ensure that all expressions can be implemented without having to scan entire domains – This is both in order to prepare the input and in order to evaluate expressions with quantifiers A DRC expression is called safe if it satisfies some additional requirements that ensure that its result only depends on the relations in the database and not on the entire possible range of attribute values – Not every DRC expression has an equivalent Safe-DRC expression A more formal definition is given in the tutorial

21 DRC – Summarizing Example We are going to compare between a few RA and DRC queries Station Height S_Name S_Type Line L_Type L_Num Direction Serves Km Train T_Num Days Service T_Category Class Food Gives Arrives Platform D_TimeA_Time

22 The Relevant Relations Since in DRC the order of variables may impact the result, we will first define the order of attributes for each relation: – Station[S_Name, Height] Station_Type[S_Name,S_Type] – Line[L_Num,Direction,L_Type] Serves[S_Name,L_Num,Direction,Km] – Train[T_Num,Days] Service[T_Category,Class,Food] – Gives[T_Num,T_Category,Class] – Arrives[T_Num,S_Name,L_Num,Direction,A_Time,D_Time,Platform]

23 Sample Query in DRC Which stations are served on Line 1-south? – As in RA, we can extract the data from “Serves” { :  y,z,w ( Serves(x,y,z,w)  y=1  z=“south” )} – This translates directly to RA as  S_Name (  (L_Num=1)  (Direction=“south”) (Serves)) – Such an expression can be also written in a shortened manner as { :  w Serves(x,1,“south”,w)}

24 Another Example Which lines have stations below sea level? – For every pair of variables y,z representing a line (L_Num,Direction) for which there are stations, we can identify the stations using the variable x for which there is some w such that Serves(x,y,z,w) is true – Hence, the expression { :  x,w(Serves(x,y,z,w)  (x))} where  (x) is satisfied iff x is a station below sea level – For this, we have  (x) =  u(Station(x,u)  (u < 0 )) – Resulting in the following query { :  x,w(Serves(x,y,z,w)  (  u(Station(x,u)  (u<0)))}

25 The Other Direction What does the following query means? { :  y 1,y 2 (Station(y 1,y 2 )   z 1,z 2,z 3,z 4,z 5 (Arrives(x,y 1,z 1,z 2,z 3,z 4,z 5 )))} – The sub-expression  z 1,z 2,z 3,z 4,z 5 (Arrives(x,y 1,z 1,z 2,z 3,z 4,z 5 )) represents that train number x stops in station called y 1 of a certain line number, direction, etc. – The sub-expression around the  sign (without the quantifiers) indicates that if (y 1,y 2 ) is a station then x is a train that stops there – Hence, the complete expression gives the train numbers stopping in all stations Always? How can this be fixed? – Personal exercise: write an expression that will only take into account stations in which trains stop at

26 A More Complex Example What are the lines that share a change station? – First, here is an expression for obtaining all change stations  1 (x)=  y(Station(x,y)  Station_Type(x,“change”)) – Next, we locate pairs of lines appearing with the same change station  2 (u 1,v 1,u 2,v 2 )=  x,w 1,w 2 (  1 (x)  Serves(x,u 1,v 1,w 1 )  Serves(x,u 2,v 2,w 2 )) – The complete query, including line types, is then { : Line(u 1,v 1,t 1 )  Line(u 2,v 2,t 2 )   2 (u 1,v 1,u 2,v 2 )  (u 1 ≠u 2 )}

27 Tuple Relational Calculus (TRC)

28 TRC The main difference between TRC and DRC is that in TRC variables’ value are complete records rather than a single attribute Hence, to define a query that returns a relation, we need an expression  (t) that has a single free variable (although it may include sub-expressions that have quantified variables) The general query structure is {t[A 1,...,A n ]:  (t)}, which returns a relation with attributes A 1,...,A n including all possible records having corresponding attributes for which  is satisfied

29 Atomic Formulas in TRC Belongs to – For a relation R[A 1,...,A n ] and variable t, the atomic formula “t  R” is satisfied iff the value of t is a record inside R Comparisons – We denote t[A] the value of attribute A in the variable t – For example, “t[A]=2” is satisfied if that value of A in t is 2 – “r[B]<s[C]” is satisfied if the values of the corresponding attributes of s and r satisfy this condition

30 Composed Formulas in TRC Boolean Formulas – These are written in TRC exactly the same as in DRC and have the same meaning – For example, the formula  1 (t 1,...,t k )  2 (t 1,...,t k ) is satisfied iff both  1 and  2 are satisfied with respect to t 1,...,t k Recall that here each t i is a complete record (tuple) Quantifiers – Are also written as expected – For example,  t 1  (t 1,...,t k ) is satisfied iff there is a possible record (regardless of whether it is in the database or not) for which  (t 1,...,t k ) is satisfied – here t 2,...,t k are free variables – Yet, here the quantifier applies to complete records (lines) rather than single attributes – When it is not clear from the context, we write the attributes of the quantified variable explicitly, e.g.,  t 1 [A 1,…,A n ]  (t 1,...,t k )

31 Sample Queries in TRC In the train operation example, what is the formula for “which stations are served by line 1-south”? – {t[S_Name] :  s[S_Name,L_Num,Direction,Km] ( s  Serves  t[S_Name]=s[S_Name]   s[L_Num]=1  s[Direction]=“south” )} What about “which lines serve stations below sea level”? – {t[L_Num,Direction] :  r,s (r  Station  s  Serves   s[L_Num]=t[L_Num]  s[Direction]=t[Direction]   r[S_Name]=s[S_Name]  r[Height]<0 )}

32 The Expressive Power of TRC and DRC Every query that can be expressed in TRC can also be expressed in DRC and vice versa In particular, TRC also has the problem that it can be used to define queries that cannot be implemented in a database Similarly to DRC, TRC also has a safe version – Safe-TRC and Safe-DRC have the same expressive power, i.e., any formula written by Safe-TRC can be expressed in Safe-DRC and vice versa

33 The Expressive Power of Relational Calculus The expressive power of relational calculus is at least as strong as RA since as we saw for DRC (and it is true also for TRC), any query in RA can be expressed in DRC Yet, there are expressions in DRC (and TRC) that cannot be expressed in RA – Although DRC enables writing expressions that cannot be implemented in a database Safe relational calculus (both Safe-DRC and Safe-TRC) as studied in the tutorials have the same expressive power as RA Relational calculus (both safe and unsafe) cannot be used for expressing transitive closure – E.g., “What are all the stations that can be reached from station S in a finite number of train changes?”


Download ppt "Database Systems 236363 Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in."

Similar presentations


Ads by Google