# Dzielenie relacyjne (Relational division) Bazy i hurtownie danych, TWO1, 2010 https://ophelia.cs.put.poznan.pl/webdav/dbdw/students/ dbdw-winter_2010-11/

## Presentation on theme: "Dzielenie relacyjne (Relational division) Bazy i hurtownie danych, TWO1, 2010 https://ophelia.cs.put.poznan.pl/webdav/dbdw/students/ dbdw-winter_2010-11/"— Presentation transcript:

Dzielenie relacyjne (Relational division) Bazy i hurtownie danych, TWO1, 2010 https://ophelia.cs.put.poznan.pl/webdav/dbdw/students/ dbdw-winter_2010-11/

References 1.V.M. Matos, R. Grasser, Assessing performance of the relational division operator. Data Base Management 2001, 22-20-30, 1-15 2.V.M. Matos, R. Grasser, A Simpler (and Better) SQL Approach to Relational Division, Journal of Information System Education 2002, 13 (2), 85-88. 21/11/2010Bazy i hurtownie danych2

Relational Division  The basic operators of the relational algebra: – Union (UNION) – Difference (MINUS/EXCEPT) – Cartesian product – Projection & selection (SELECT... FROM...)  Additional operators added to the relational algebra: – Join  most popular in practice – Rename (AS) – Intersection (INTERSECT) – Division 21/11/2010Bazy i hurtownie danych3

Relational Division  The division operator is less common than select-project-join queries, however, it is applicable to many common queries: – Find suppliers who supply all the engine parts – Find students who have taken all the core courses – Find customers who have ordered all items from a given line of products  The division operator can be also employed in data mining algorithms (e.g., generation of association rules) 21/11/2010Bazy i hurtownie danych4

Informal Definition  The division operator allows verifying whether or not a candidate subject is related to each of the values held in the base set.  The base set is called the divisor (or denominator T2[B]), and the table holding the subject’s data is called the dividend (or nominator T1[A, B]).  The expression T1[A, B]/T2[B] selects the A values from the dividend table T1[A, B], whose B values are a superset of those B values held in the divisor table T2[B]. 21/11/2010Bazy i hurtownie danych5

Informal Definition 21/11/2010Bazy i hurtownie danych T3[A] = T1[A,B] / T2[B] 6

More Practical Example #1 Given the relations ORDERS[SID, PID, QTY] and PRODUCTS[PID, PRICE] find all the stores that ordered at least 10 of all products priced over 15 \$ 21/11/2010Bazy i hurtownie danych SIDPIDQTY s2p225 s2p320 s2p420 s1p125 s1p44 s1p312 s1p230 s3p412 s3p214 ORDERS PIDPRICE p110 p225 p318 p420 PRODUCTS ORDERS[SID,PID] / PRODUCTS[PID] = { s2 } 7

More Practical Example #2 Given the relations ORDERS[SID, PID, PMNT] and PRODUCTS[PID, PRICE] we focus on ordered products priced over 15\$ and paid either with cash or a credit card. What would be the result of ORDERS[SID, PID, PMNT] / PRODUCTS[PID]? 21/11/2010Bazy i hurtownie danych SIDPIDPMNT s1p1cash s1p2bank s1p2credit s1p3cash s1p4bank s1p3credit s2p2cash s2p2bank s2p3credit ORDERS PIDPRICE p110 p225 p318 p414 PRODUCTS ORDERS[SID,PID,PMNT] / PRODUCTS[PID] = { (s1, credit) } 8

Formal Definition: Relational Algebra Let’s assume that the numerator table T1 always consists of two columns A and B, and the denominator has only one B attribute. Then, the expression T1[A, B]/T2[B] is semantically equivalent to: T1[A, B]/T2[B] = T1[A] – ((T1[A] × T2[B]) – T1[A, B])[A] 21/11/2010Bazy i hurtownie danych9

Formal Definition: Relational Algebra 21/11/2010Bazy i hurtownie danych10

Formal Definition: Tuple-Calculus Using relational tuple-calculus language, the division operator can be rephrased as follows: T1[A, B]/T2[B] = { t1[A] / t1  T1 and for-all t2 (t2  T2  exists t3 (t3  T1 and (t1[A] = t3[A]) and (t2[B] = t3[B]))) } 21/11/2010Bazy i hurtownie danych11

Formal Definition: NFNF Databases (Non First-Normal Form)  Assumption of a NFNF format (fields with sets of atomic values) results in a much simplified definition of the division operator in tuple-calculus (attribute *B in T1 and T2 is defined as a set of atomic values): T1[A, B]/T2[B] = { t1[A] / t1  T1 and t2  T2 and t2[*B]  t1[*B] } 21/11/2010Bazy i hurtownie danych12

Formal Definition: NFNF Databases 21/11/2010Bazy i hurtownie danych13

SQL Implementation: Q0 SELECT A FROM T1 WHERE B IN (SELECT B FROM T2) GROUP BY A HAVING COUNT(*) = (SELECT COUNT(*) FROM T2) 21/11/2010Bazy i hurtownie danych14

SQL Implementation: Q1 (Byzantine Method )  Based on the formal predicate calculus definition modified to fit SQL: – The universal quantifier for-all x (f(x)) replaced by not exists x (not f(x)) – The implication X  Y replaced by (not(X) or Y) T1[A, B]/T2[B] = { t1[A] / t1  T1 and not exists t2 (not(not( t2  T2) or (exists t3 (t3  T1 and (t1[A] = t3[A]) and (t2[B] = t3[B]))))) } 21/11/2010Bazy i hurtownie danych15

SQL Implementation: Q1 Previous definition is equivalent (following De Morgan’s law  not (P or Q) = not P and not Q ) to: T1[A, B]/T2[B] = { t1[A] / t1  T1 and not exists t2 (( t2  T2) and (not exists t3 (t3  T1 and (t1[A] = t3[A]) and (t2[B] = t3[B])))) } 21/11/2010Bazy i hurtownie danych16

SQL Implementation: Q1 SELECT DISTINCT x.A FROM T1 AS x WHERE NOT EXISTS (SELECT * FROM T2 y WHERE NOT EXISTS (SELECT * FROM T1 AS z WHERE (z.A=x.A) AND (z.B=y.B))) 21/11/2010Bazy i hurtownie danych17

SQL Implementation: Q2 Based on the algebraic definition of the division operator and broken into two steps: SELECT DISTINCT y.A, z.B INTO T3 FROM T1 AS y, T2 AS z WHERE NOT EXISTS (SELECT * FROM T1 WHERE (T1.A = y.A) AND (T1.B=z.B)) SELECT DISTINCT A FROM T1 WHERE NOT EXISTS (SELECT * FROM T3 WHERE (T3.A=T1.A)) 21/11/2010Bazy i hurtownie danych T1[A, B]/T2[B] = T1[A] – ((T1[A] × T2[B]) – T1[A, B])[A] 18

SQL Implementation: Q3 Based on the definition for the NFNF and tuple-calculus: SELECT DISTINCT x.A FROM T1 AS x WHERE (SELECT COUNT(*) FROM T2) = (SELECT COUNT(*) FROM T1, T2 WHERE (T1.A=x.A) AND (T1.B=T2.B)) 21/11/2010Bazy i hurtownie danych19

Zero Division  The divide operator is defined in such a way that T1[A,B]/T2[B] produces exactly all A values in T1 each time that T2[B] is empty.  An empty set would be a more appropriate answer  this is how Q0 works. 21/11/2010Bazy i hurtownie danych20

Experimental Evaluation of Q0…Q3  Assume basic structure of tables (T1[A, B], T2[B], integer or char)  Conduct an experiment with the following settings: – number of A-values in T1 = 10 000, – number of B-values in T1 = 100, – number of B-values in T2 = 0, 20, 40, 60, 80, 100.  Use provided scripts to generate sample tables and to run specific queries (Q0…Q3).  Observer performance of specific queries (execution time – CPU time).  Collect the observations in a tabular and graphical form. 21/11/2010Bazy i hurtownie danych https://ophelia.cs.put.poznan.pl/webdav/dbdw/students/ dbdw-winter_2010-11/ 21

Checking Execution Time 21/11/2010Bazy i hurtownie danych Turn SET STATISTICS TIME on (Tools  Options) 22

Download ppt "Dzielenie relacyjne (Relational division) Bazy i hurtownie danych, TWO1, 2010 https://ophelia.cs.put.poznan.pl/webdav/dbdw/students/ dbdw-winter_2010-11/"

Similar presentations