Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or.

Similar presentations


Presentation on theme: "Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or."— Presentation transcript:

1 Copyright © C.J. Date All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photographic, or otherwise, without the explicit written permission of the copyright owner. How to Write Correct SQL and Know It: A Relational Approach to SQL a technical seminar for DBAs, data architects, DBMS implementers, database application programmers, and other database professionals by C. J. Date

2 Copyright C. J. Date 2008page 1 THESIS : 1.Youre an SQL professional 2.But SQL is complicated and difficult (much more so than SQL advocates would have you believe) 3.And testing can never be exhaustive 4.So to have a hope of writing correct SQL, you must follow some discipline 5.Q: What discipline? A: Discipline of using SQL relationally 6.So you must know relational theory thoroughly too (as well as SQL itself)

3 Copyright C. J. Date 2008page 2 USING SQL RELATIONALLY : Why is this a good idea? What does it mean? Isnt SQL relational anyway? And in any case... What does "SQL" mean?* Objectives: 1.Cover relational theory thoroughly /* what it is but not always why */ 2.Apply that theory to SQL practice /* and explain esoteric SQL features */ * Ignore (e.g.) OLAP, dynamic SQL, user defined types, and other nonrelational stuff

4 Copyright C. J. Date 2008page 3 PREREQUISITES : This seminar is not for complete beginners... but it's not just a refresher course, either! Aimed at database professionals: Know SQL reasonably well Know that relational theory is A Good Thing Sadly, if your "relational" knowledge derives from SQL alone, you won't know the relational model as well as you should, and you might know some things that ain't so SQL the relational model !!!

5 Copyright C. J. Date 2008page 4 FOR EXAMPLE : What exactly is first normal form? Whats the connection between relations and predicates? Whats semantic optimization? Whats an image relation? Whats semidifference and why is it important? Why doesnt deferred integrity checking make sense?

6 Copyright C. J. Date 2008page 5 Whats a relation variable? Whats prenex normal form? Can a relation have an attribute whose values are themselves relations? Is SQL relationally complete? Whats The Information Principle? How does XML fit with the relational model?

7 Copyright C. J. Date 2008page 6 Relational terms when discussing relational theory relation, tuple, attribute (etc.); SQL terms when discussing SQLtable, row, column (etc.) Note: The equivalences are not exact! One term Ill use in connection with both relational theory and SQL: operator (SQL uses operator, function, procedure, routine, method, but they all mean the same thing, pretty much) Thus, e.g., "=", ":=", "+", SELECT, DISTINCT, UNION, SUM,operators GROUP BY, etc., etc. TERMINOLOGY :

8 Copyright C. J. Date 2008page 7 WHY DO YOU NEED TO KNOW RELATIONAL THEORY ??? Because it's PRINCIPLES... FOUNDATIONS... Professionals should know the foundations of their field Technology and products (and SQL) change all the time, but principles ENDURE... Hence emphasis on: Principles, not products Foundations, not fads Compromises and tradeoffs might be necessary in "the real world" but should always be made from a position of conceptual strength

9 Copyright C. J. Date 2008page 8 Those who are enamored of practice without theory are like a pilot who goes into a ship without rudder or compass and never has any certainty where he [sic] is going. Practice should always be based on a sound knowledge of theory. Leonardo da Vinci ( ) Languages die... mathematical ideas do not. G. H. Hardy ( ) SOME NICE QUOTES :

10 Copyright C. J. Date 2008page 9 THEORYIS PRACTICAL !

11 Copyright C. J. Date 2008page 10 The gap between theory and practice is not as wide in theory as it is in practice Anon. UNFORTUNATELY...

12 Copyright C. J. Date 2008page 11 CODDS ORIGINAL RELATIONAL MODEL : AN OVERVIEW STRUCTURE: types ("domains") n-ary relations attributes, tuples keys: candidate, primary, foreign INTEGRITY: entity integrity /* but I don't believe in nulls !!! */ referential integrity MANIPULATION: relational algebra:/* see later re relational calculus */ intersection, union, difference, product restrict, project, join, divide relational assignment DEPT DNO LOC BUDGET EMP ENO ENAME DNO SAL

13 Copyright C. J. Date 2008page 12 CODDS ORIGINAL RELATIONAL ALGEBRA : AN OVERVIEW (natural) join a1 b1 b1 c1 a1 b1 c1 a2 b1 b2 c2 a2 b1 c1 a3 b2 b3 c3 a3 b2 c2 intersect difference project a divide a x a y a z b x c y xzxz abcabc xyxy a x a y b x b y c x c y (select) restrict product union

14 Copyright C. J. Date 2008page 13 SP SNO PNO QTY S1 P1300 S1 P2200 S1P3400 S1P4 200 S1P5100 S1 P6100 S2P1 300 S2P2 400 S3P2200 S4 P2 200 S4 P4 300 S4 P5 400 P PNO PNAME COLOR WEIGHTCITY P1 Nut Red 12.0 London P2 Bolt Green 17.0Paris P3 Screw Blue 17.0 Oslo P4 Screw Red 14.0 London P5 Cam Blue 12.0Paris P6 Cog Red 19.0 London S SNO SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens THE SUPPLIERS-AND-PARTS DATABASE :

15 Copyright C. J. Date 2008page 14 MODEL vs. IMPLEMENTATION : Unfortunately the term "data model" is used in the IT world with two very different meanings: Data model (first sense): An abstract, self-contained, logical definition of the objects, operators, and so forth, that together make up the abstract machine with which users interact. The objects allow us to model the structure of data. The operators allow us to model its behavior. Implementation: The physical realization on a real machine of the components of the abstract machine that together constitute the data model in question.

16 Copyright C. J. Date 2008page 15 MODEL vs. IMPLEMENTATION (cont.) : Data model (second sense): A model of the persistent data of some particular enterprise (i.e., a logical DB design). First meaning: Like a programming language, whose constructs can be used to solve many specific problems, but in and of themselves have no direct connection with any such specific problem Second meaning: Like a specific program written in that languageuses the facilities provided by the model (first meaning) to solve some specific problem

17 Copyright C. J. Date 2008page 16 MODEL vs. IMPLEMENTATION (cont.) : From here on "model" means the first sense (barring explicit statements to the contrary) Dont confuse model vs. implementation !!!... e.g., dont confuse keys vs. unique indexes Model vs. implementation implies (physical) data independence... Hence protection of investment Everything to do with performance is primarily an implementation, not a model, issue! /* and recommendations to follow are almost NEVER */ /* driven by performance concerns...*/

18 Copyright C. J. Date 2008page 17 E.g., "JOINS ARE SLOW" : MAKES NO SENSE !!! S JOIN SP /* good*/ vs./* bad */ do for all tuples in S ; fetch S tuple into TS, TN, TT, TC ; do for all tuples in SP with SNO = TS ; fetch SP tuple into TS, TP, TQ ; emit tuple TS, TN, TT, TC, TP, TQ ; end ; Recommendation: Dont do this!

19 Copyright C. J. Date 2008page 18 PROPERTIES OF RELATIONS : Every relation has a heading (set of attribute namesmore precisely, attribute-name:type-name pairs, but informally we often ignore the types) and a body (set of tuples) No. of attributes = degree, no. of tuples = cardinality Relations never contain duplicate tuples /* SQL fails here */ The tuples of a relation are unordered, top to bottom The attributes of a relation are unordered, left to right /* SQL fails here */

20 Copyright C. J. Date 2008page 19 Every subset of a tuple is a tuple... Every subset of a heading is a heading... Every subset of a body is a body Tuple equality: Two tuples EQUAL iff (= if and only if) Same attributes (i.e., same attribute-name/type-name pairs) And attributes with same name have same attribute value I.e., iff they're the same tuple !!! Two tuples are duplicates iff they're equal MANY features of the relational model rely on the above NOTE THAT :

21 Copyright C. J. Date 2008page 20 MORE ON RELATIONS : Relations are always normalized (i.e., in first normal form, 1NF) A relation and a table arent the same thing! A table can be regarded as a CONCRETE picture of an ABSTRACT idea (but its a significant advantage of the relational model that its fundamental data objects have such a simple and easily understood concrete representation) Base vs. derived relations /* see next page */

22 Copyright C. J. Date 2008page 21 BASE vs. DERIVED RELATIONS : Rel ops let us start with given rels and derive further rels (e.g., by doing queries)... Given rels are base ones, others are derived Must be able to define base ones (CREATE TABLE in SQL) and base ones must be named Certain derived relsin particular, views (aka virtual rels) are named too: e.g., CREATE VIEW SST_PARIS ASSELECTSNO, STATUS FROM S WHERE CITY = Paris ;

23 Copyright C. J. Date 2008page 22 Value of view at time t =result of evaluating defining expression at time t Can operate on views as if they were base rels... Can think of view as being conceptually materialized at time of reference But it isnt really materialized! /* at least, we hope not */ And materialization wouldnt work for updates anyway

24 Copyright C. J. Date 2008page 23 What you often hear: Base rels "physically exist" Views dont "physically exist" Wrong! RM deliberately has nothing to say about physical storage matters! Also... its all relations !!! POPULAR MISCONCEPTIONS :

25 Copyright C. J. Date 2008page 24 FROM A RECENT TEXTBOOK : "[It] is important to make a distinction between stored relations, which are tables, and virtual relations, which are views... [We] shall use relation only where a table or a view could be used. When we want to emphasize that a relation is stored, rather than a view, we shall sometimes use the term base relation or base table." How many confusions here? No wonder there's so much confusion out there, if this is typical of the quality of the teaching (which it probably is)

26 Copyright C. J. Date 2008page 25 ONE FURTHER (important) PRELIMINARY : RELATIONS vs. RELVARS Historically there has been much confusion between relations as such (i.e., relation values) and relation variables Consider:DECLARE N INTEGER... pgmg lang N is an integer variable whose values are integers per se Likewise: CREATE TABLE T... SQL T is a relation variable whose values are relations per se /* ignoring SQL quirks */ For example:

27 Copyright C. J. Date 2008page 26 S SNO SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris relation variable current relation value DELETE S WHERE CITY = Paris ; Shorthand for: S := S WHERE NOT (CITY = Paris ) ; current relation value relation variable S SNO SNAME STATUS CITY S1 Smith 20 London

28 Copyright C. J. Date 2008page 27 HENCE : INSERT / DELETE / UPDATE are all shorthand for some relational assignment, andby definitionthey all assign some relation value to some relation variable A relation variable or relvar is a variable whose permitted values are relations Base (or real) relvar: One that isnt virtual Virtual relvar: One thats defined by means of some specified relational expression in terms of one or more other relvars Henceforth:Relation means relation / relvar means relvar!... and we ought to start again

29 Copyright C. J. Date 2008page 28 BY THE WAY : SQL doesnt support relational assignment as such... So foregoing example S := S WHERE NOT ( CITY = Paris ) ; is expressed in Tutorial D... Self-explanatory (?) "toy" language used by Date and Darwen to illustrate the ideas of The Third Manifesto In what follows, Ill use Tutorial D to illustrate relational concepts (as well as showing SQL analogs where applicable)

30 Copyright C. J. Date 2008page 29 ASIDE : THE THIRD MANIFESTO C. J. Date and Hugh Darwen: Databases, Types, and the Relational Model: The Third Manifesto (3rd edition, Addison-Wesley, 2006) Proposal for future direction of data and DBMSs D = any language that conforms to Manifesto principles (generic name) Tutorial D = language used in Manifesto book as a basis for examples See

31 Copyright C. J. Date 2008page 30 VALUES vs. VARIABLES IN GENERAL : VALUE : an "individual constant" no location in time or space cant be changed can be represented in memory (by some encoding) VARIABLE : a holder for (the representation of) a value has location in time and space can be updated (i.e., current value can be replaced by another) Important note: Values and variables (more fundamentally, types) can be arbitrarily complex Hard to imagine people getting confused over such a basic distinction, but they do...

32 Copyright C. J. Date 2008page 31 VALUE vs. VARIABLE CONFUSION : AN EXAMPLE : "We distinguish the declared type of a variable from... the type of the object that is the current value of the variable... (so an object is a value) "... we distinguish objects from values... (so an object isn't a value after all) ??? "... a MUTATOR [is an operation such that it's] possible to observe its effect on some object." (in fact, an object is a variable) ?????

33 Copyright C. J. Date 2008page 32 A GUIDING PRINCIPLE AND A GREAT AID TO CLEAR THINKING : All logical differences are big differences Wittgenstein Examples: Model vs. implementation Relation vs. table Value vs. variable Attribute vs. column Relation vs. relvar Tuple vs. row Base relvar vs. view SQL vs. relational model Data model (1st sense) vs. DB vs. DBMS data model (2nd sense) Expression vs. statement

34 Copyright C. J. Date 2008page 33 1.Setting the scene 8.SQL and constraints 2. Types and domains 9.SQL and views 3.Tuples and relations, 10.SQL and logic I: rows and tablesRelational calculus 4.No duplicates, no nulls11.SQL and logic II: Using logic to write SQL 5.Base relvars, base tables 12.Further SQL topics 6.SQL and algebra I: The original operators13.Appendix: The relational model 7.SQL and algebra II: Additional operators14.Appendix: DB design STRUCTURE OF PRESENTATION :

35 Copyright C. J. Date 2008page 34 RELATIONS ARE DEFINED OVER TYPES : RM implies support for user defined typeshence, user defined operators alsohence, an "object/relational" DBMS done right is just a relational DBMS done right! RM attributes can be of any type whatsoever, except (a) no pointer valued attributes; (b) relation r cannot have an attribute of the same type as r itself (see later) But whole point about user defined types is: They look just like system defined types to other users... So Ill just assume types are system defined (mostly) RM prescribes type BOOLEAN... Assume CHAR, INTEGER, FIXED available too /* see later for SQL */

36 Copyright C. J. Date 2008page 35 DOMAINS AND TYPES ARE THE SAME THING : 1.Equality comparisons and "domain check override" (DCO) domains really are types... Note:Assume for sake of discussion that SNO attribs in S and SP are of user defined type SNO... PNO attribs in P and SP are of user defined type PNO Caveat: Only fair to warn you that I discuss "DCO" only to dismiss it... as well see 2.Data value atomicity and first normal form... of arbitrary complexity

37 Copyright C. J. Date 2008page 36 EQUALITY COMPARISONS : "Everyone knows" that two values can be tested for equality only if they come from the same domain E.g., with suppliers and parts: SP.SNO = S.SNO /* OK */ SP.PNO = S.SNO /* not OK */ Any relational opjoin, union, etc.that calls for an explicit or implicit equality comparison between values from different domains should fail /* at compile time */ E.g., SELECT S.SNO, S.SNAME, S.STATUS, S.CITY FROM S WHERE NOT EXISTS (SELECT * FROM SP WHERE SP.PNO = S.SNO ) /* not OK */ Probably a typo

38 Copyright C. J. Date 2008page 37 Comparison "SP.PNO = S.SNO" is INVALID unless user insists... (Codd's "domain check override" ops) BUT, according to Codd: P.WEIGHT = SP.QTY /* not OK */ P.WEIGHT - SP.QTY = 0 /* OK... ?!?!? */ "... DBMS checks that the basic data types are the same" [Codd's book on RM/V2 p.47, italics added] So theres something strange about Codd-style domain checks in the first place, let alone "domain check override" EQUALITY COMPARISONS (cont.) :

39 Copyright C. J. Date 2008page 38 "DOMAIN CHECK OVERRIDE" : Indeed, "domain check override" (DCO) is not the appropriate concept (in fact, it makes no sense AT ALL * )... Consider comparisons: S.SNO = 'X4' P.PNO = 'X4' S.SNO = P.PNO valid valid invalid What's going on ??? Well... * Stems from failure to recognize another logical difference! (see next page)

40 Copyright C. J. Date 2008page 39 SNO, PNO are typesrepresented internally in terms of type CHAR, saybut representation is (or should be) irrelevant and HIDDEN! (its an implementation issue)... Logical difference between type and representation Also selector operators SNO, PNO that effectively convert CHAR values to types SNO, PNOinvoked implicitly in: S.SNO = 'X4' P.PNO = 'X4' (i.e., strings coerced to type SNO or PNO: see later) Plus operators for inverse conversions too (in effect) This mechanism provides domain checking and "DCO" capability in a clean, fully orthogonal, non ad hoc manner

41 Copyright C. J. Date 2008page 40 What were really talking about is STRONG TYPING Which incidentally would correctly deal with expressions such as P.WEIGHT * SP.QTY ( WEIGHT ) P.WEIGHT + SP.QTY ( invalid ) SPX.QTY + SPY.QTY /* SPX and SPY both shipments */ ( QTY ) etc., etc.

42 Copyright C. J. Date 2008page 41 DATA VALUE ATOMICITY : First normal form (1NF) requires every attribute value in every tuple to be "atomic" Codd defines atomic as "nondecomposable by the DBMS (excluding certain special functions)" But this defn is a trifle puzzling, and/or not very precise... What aboutstrings (SUBSTR, LIKE, etc.)? numbers (INTEGER, FRACTION, etc.)? dates (YEAR, MONTH, DAY)? times (HOUR, MIN, SEC)? Not to mention, e.g., view defns in the catalog

43 Copyright C. J. Date 2008page 42 NOW WATCH VERY CAREFULLY !!! R1 R2 R3 SNO PNO SNO PNO SNO PNO_SET S2 P1 S2 P1,P2 S2 {P1,P2} S2 P2S3 P2S3 {P2} S3 P2 S4 P2,P4,P5 S4 {P2,P4,P5} S4 P2 S4 P4 S4 P5 This one is clearly 1NF... This one is clearly NOT 1NF... PNO is "repeating group" or "multivalued" (?) But this one is 1NF again !!!

44 Copyright C. J. Date 2008page 43 Values of PNO_SET in R3 are no more and no less "decomposable by the DBMS" than are strings, dates, etc. (R3 might not be a good DESIGNthats a separate issue) The real point: "Atomicity" has no absolute meaning!

45 Copyright C. J. Date 2008page 44 A CLOSER LOOK AT R3 : SNO PNO_REL S2 PNO P1 P2 S3 PNO P /* note name change */ Values in PNO_REL position are RELATIONS! … PNO_REL is a relation- valued attribute (RVA) /* no table valued columns in */ /* SQL, though SQL does support */ /* columns with values that are*/ /* multisets of rows*/

46 Copyright C. J. Date 2008page 45 A DOMAIN IS A DATA TYPE (summary) : Domains, and therefore attributes, can contain ABSOLUTELY ANYTHING !!! (any values, that is) Arrays, lists, relations, XML docs, photos,... n I.e., values of ARBITRARY COMPLEXITY Without violating first normal form! Recap:RM implies support for user defined typeshence, user defined ops alsohence, an "O/R" DBMS done right is just a relational DBMS done right! From here on, favor type over domain DOMAIN TYPE

47 Copyright C. J. Date 2008page 46 TO SPELL IT OUT ONE MORE TIME : THE QUESTION AS TO WHAT TYPES ARE SUPPORTED IS ORTHOGONAL TO THE QUESTION OF SUPPORT FOR THE RELATIONAL MODEL More succinctly: TYPES ARE ORTHOGONAL TO TABLES The relational model has NEVER prescribed data types (it's never been implemented eitherbut that's another matter)

48 Copyright C. J. Date 2008page 47 SO WHATS A TYPE ??? Basically, a named set of valuese.g., all possible integers (INTEGER); all possible character strings (CHAR); all possible supplier numbers (SNO); all XML docs... all fingerprints... all X rays... etc., etc. Every value (in partic, every relation) is of some typein fact, exactly one type /* so types disjoint */ unless type inheritance is supportedand carries its type with it Every variable (in partic, every relvar), every attribute of every relation, every operator that returns a result, and every parameter of every operator is declared to be of some type

49 Copyright C. J. Date 2008page 48 To say that variable V is of type T is to say that every value v that can legally be assigned to V is of type T Aside: To say that V is a variable is to say that V is "assignable to" (i.e., updatable) Every expression denotes some value and is of some type = type of value in question = type of value returned by outermost operator E.g., type of ( a + b ) * ( x - y ) is whatever the declared type of "*" is

50 Copyright C. J. Date 2008page 49 Associated with type T is a set of ops for operating on values and variables of type T... ("associated with" means op in question has parameter of declared type T) E.g., system-defined type INTEGER: System defines ":=", "=", "<", etc., for assigning and comparing integers And "+", "*", etc., for arithmetic on integers Perhaps CAST to convert integers to char strings But not "||", SUBSTR, etc.

51 Copyright C. J. Date 2008page 50 E.g., user-defined type SNO: Type definer defines ":=", "=", and maybe "<" etc., for assigning and comparing supplier numbers But not "+", "*", etc. Subscript ops for arrays Special arith ops for dates and times XQuery ops for XML docs... and so on

52 Copyright C. J. Date 2008page 51 DEFINING A NEW TYPE INVOLVES AT LEAST ALL OF THE FOLLOWING : 1.Specifying a name for the type 2.Specifying the values that make up the type /* see later */ 3.Specifying the physical representation /* ignore */ 4.Specifying a selector op for selecting values of the type /* see later */ 5.Specifying ops that apply to values and variables of the type... Must include "=" and ":=" !!! 6.For those ops that return a result, specifying the type of the result (so DBMS knows which expressions are legal, and type of result of every legal expression)

53 Copyright C. J. Date 2008page 52 EXAMPLE (Tutorial D) : Define type: TYPE POINT... /* geometric points in 2D space */ ; Define op REFLECT that, given point (x,y), returns inverse point (-x,-y): OPERATOR REFLECT ( P POINT ) RETURNS POINT ; RETURN POINT ( - THE_X ( P ), - THE_Y ( P ) ) ; /* POINT selector invocation... takes two */ /* arguments (unlike SNO selector earlier)*/ END OPERATOR ;

54 Copyright C. J. Date 2008page 53 POINTS ARISING (sorry) : Another important logical difference: argument vs. parameter And another: operator vs. invocation Selector is a generalization of the familiar concept of a literal

55 Copyright C. J. Date 2008page 54 NOTE TOO THAT : The values that make up a given type exist BEFORE the DB exists, WHILE the DB exists, and AFTER the DB exists... Better: They "have no location in time or space" Defining type T just means "now we're interested in a certain set of values and we want to call it T" Similarly for dropping type T Values and sets of values don't "belong" to any particular DB!

56 Copyright C. J. Date 2008page 55 Type is scalar if no user visible components, nonscalar otherwise Values, variables, etc., of type T are scalar if T is scalar, nonscalar otherwise Nonscalar example (Tutorial D): VAR S BASE RELATION { SNO CHAR, SNAME CHAR, STATUS INTEGER, CITY CHAR } KEY { SNO } ; RELATION {...} is a relation type (nonscalar) /* order in which attribs specified insignificant */ SCALAR vs. NONSCALAR /* informal distinction */ :

57 Copyright C. J. Date 2008page 56 RELATION {...} is also a generated type... obtained by invoking RELATION type generator (not defined by separate TYPE statement) Example involving TUPLE type generator: VAR SINGLE_SUPPLIER TUPLE { STATUS INTEGER, SNO CHAR, CITY CHAR, SNAME CHAR } ; Code fragment /* illustrating "tuple extraction" */ : SINGLE_SUPPLIER := TUPLE FROM ( S WHERE SNO = S1 ) ; Note logical difference between tuple t and relation r containing just tuple t !!! TYPE GENERATORS :

58 Copyright C. J. Date 2008page 57 BOOLEANNUMERIC(p,q)DATE CHARACTER(n)DECIMAL(p,q)TIME CHARACTER VARYING(n)INTEGERTIMESTAMP FLOAT(p)SMALLINTINTERVAL 1.Various defaults, abbreviations, alternative spellings 2.Literals (more or less conventional) 3.Scalar assignment: SET = ; Plus implicit scalar assignments on FETCH etc. SCALAR TYPES IN SQL :

59 Copyright C. J. Date 2008page 58 4.Scalar equality comparison: = Plus implicit comparisons on DISTINCT, UNION, etc. Unfortunately "=" support is badly flawed! Can give TRUE even if comparands clearly distinguishable /* discuss in a moment */ Can fail to give TRUE even if comparands not distinguishable /* see nulls, later */

60 Copyright C. J. Date 2008page 59 5.BOOLEAN might not be supported... If it isnt: Boolean exps can still appear in WHERE, ON, HAVING But no table can have a column of type BOOLEAN, and no variable can be declared to be of type BOOLEAN So workarounds might be needed... 6.SQL also supports "domains"... But SQL domains arent types at all... In fact, completely unnecessary, now that SQL does support user defined types... Use them if you like, but dont mistake them for true relational domains

61 Copyright C. J. Date 2008page 60 SQL supports a weak form of strong typing (!) on assignment and equality comparisons: BOOLEAN : BOOLEAN Character string : Character string Number : Number (plus various rules for dates, times, etc.) In other words, SQL often does coercions One bizarre consequence: Certain unions (etc.) can yield result with rows not appearing in either operand! SQL TYPE CHECKING AND COERCIONS :

62 Copyright C. J. Date 2008page 61 FOR EXAMPLE : INTEGER T1XYT2XYXY NUMERIC(5,1) SELECT X, Y FROM T1 UNION SELECT X, Y FROM T2... Result:

63 Copyright C. J. Date 2008page 62 RECOMMENDATIONS : 1.Ensure that columns with the same name are always of the same type /* see later */ 2.Avoid type conversions where possible 3.When they cant be avoided, do them explicitly: SELECT CAST ( X AS NUMERIC(5,1) ) AS X, Y FROM T1 UNION SELECT X, CAST ( Y AS NUMERIC(5,1) ) AS Y FROM T2 I.e., avoid coercions! /* general good practice */

64 Copyright C. J. Date 2008page 63 UNFORTUNATELY : Certain coercions are built into the definition of SQL and cant be avoided! Just for the record: If table exp tx is used as a row subquery, then the table t denoted by tx should have just one row r, and t is coerced to r If table exp tx is used as a scalar subquery, then the table t denoted by tx should have just one column and just one row and hence contain just one value v, and t is doubly coerced to v If the "row exp" rx in the ALL or ANY comparison rx theta sq (where theta is, e.g., >ALL or

65 Copyright C. J. Date 2008page 64 SQL COLLATIONS : Type checking and coercion for character strings are more complex than Ive been pretending... Given string consists of chars from one character set and has one collation Given collation = rule for specific character set... Governs comparison of strings of chars from that set Let C be a collation for character set S, and let a and b be any two characters from S. Then C must be such that exactly one of a b gives TRUE and the other two give FALSE (under C)

66 Copyright C. J. Date 2008page 65 COMPLICATIONS : Either PAD SPACE or NO PAD can apply to collation C Under PAD SPACE, distinct strings (e.g., AB and AB ) can "compare equal" Recommendation: Dont use PAD SPACE! But distinct strings might still "compare equal" even with NO PAD... E.g., if C is CASE_INSENSITIVE Recommendation: Dont do this... or if you must, then be very careful!

67 Copyright C. J. Date 2008page 66 Call v1 and v2 "equal but distinguishable" if theyre distinct but v1 = v2 gives TRUE In UNION, JOIN, MATCH, LIKE, UNIQUE, etc., implicit equality rule is indeed "equal even if distinguishable" In UNION, JOIN, GROUP BY, DISTINCT, etc., DBMS might have to choose which "equal but distinguishable" value is to appear in some column in some result row SQL gives little guidance in such situations! Hence, certain SQL expressions are indeterminate!... or "possibly nondeterministic" (SQL term)

68 Copyright C. J. Date 2008page 67 For example, SELECT MAX ( Z ) FROMT might return ZZZ on one occasion and zzz on another, even if T hasnt changed in the interim! One important consequence:Many SQL table exps arent allowed in constraints !!! Strong recommendation:Avoid possibly nondeterministic expressions as much as you can!

69 Copyright C. J. Date 2008page 68 Recall: VAR SINGLE_SUPPLIER TUPLE { STATUS INTEGER, SNO CHAR, CITY CHAR, SNAME CHAR } ; SQL analog of TUPLE type generator = ROW type constructor DECLARE SINGLE_SUPPLIER/* SQL row variable */ ROW ( SNO VARCHAR(5), SNAME VARCHAR(25), STATUS INTEGER,CITY VARCHAR(20) ) ; But "field" [sic] order matters!... 4 fields can be arranged into 24 distinct row types! SQL ROW TYPES :

70 Copyright C. J. Date 2008page 69 Row assignment: e.g., SET SINGLE_SUPPLIER = ( S WHERE SNO = S1 ) ; /* row subquery... */ Note the coercion here !!! Row comparison: /* see later */ SQL ROW TYPES (cont.) :

71 Copyright C. J. Date 2008page 70 SQL doesnt really have a TABLE type generator (or constructor) at all !!! Recall: VAR S BASE RELATION { SNO CHAR, SNAME CHAR, STATUS INTEGER, CITY CHAR } KEY { SNO } ; SQL analog: CREATE TABLE S (SNOVARCHAR(5)NOT NULL, /* note strange */ SNAMEVARCHAR(25)NOT NULL, /* jumble of */ STATUSINTEGERNOT NULL, /* column and */ CITYVARCHAR(20)NOT NULL,/* constraint*/ UNIQUE( SNO ) ) ; /* defns*/ WHAT ABOUT SQL TABLE TYPES ???

72 Copyright C. J. Date 2008page 71 No sequence of linguistic tokens in that CREATE TABLE statement that can logically be labeled "an invocation of the TABLE type constructor" If table S has any type at all, its just bag of rows, where the rows are of type ROW(SNOVARCHAR(5), SNAMEVARCHAR(25), STATUSINTEGER, CITYVARCHAR(20) )

73 Copyright C. J. Date 2008page 72 ASIDE : "TYPED TABLES" Very bad term!... If "typed table" TT defined to be "of type T," then TT is not of type T, and nor are its rows! Avoid such tables anyway, because theyre inextricably intertwined with SQLs support for pointers... RM prohibits pointers... But SQL allows a column in one table to have values that are pointers to rows in some other table... Pointers are reference values, columns containing them are of some REF type... Why? Strong recommendation:Dont use such tables, nor any features related to them!

74 Copyright C. J. Date 2008page 73 1.Setting the scene 8.SQL and constraints 2. Types and domains 9.SQL and views 3.Tuples and relations, 10.SQL and logic I: rows and tablesRelational calculus 4.No duplicates, no nulls11.SQL and logic II: Using logic to write SQL 5.Base relvars, base tables 12.Further SQL topics 6.SQL and algebra I: The original operators13.Appendix: The relational model 7.SQL and algebra II: Additional operators14.Appendix: DB design STRUCTURE OF PRESENTATION :

75 Copyright C. J. Date 2008page 74 A SAMPLE TUPLE VALUE (tuple for short) : attribute name type name SNO:CHAR SNAME : CHAR STATUS : INTEGER CITY : CHAR S1 Smith 20 London degree = 4 Attribute : attribute name + type name Component : attribute + attribute value Heading : { SNO CHAR, SNAME CHAR, STATUS INTEGER, CITY CHAR } Type : TUPLE {SNO CHAR, SNAME CHAR, STATUS INTEGER, CITY CHAR } attribute value

76 Copyright C. J. Date 2008page 75 By definition, no left to right ordering to components (so ordering arbitrary in written form) By definition, every tuple contains exactly one value, of approp type, for each attribute No nulls !!! (nulls arent values) Recommendation: Never say "null value"! Sample tuple selector invocation (tuple literal): TUPLE { SNO S1, SNAME Smith, STATUS 20, CITY London } /* keyword TUPLE does double duty in Tutorial D */

77 Copyright C. J. Date 2008page 76 Two tuples equal ("duplicates") iff very same tuple (" " and " " make sense, " " dont) Every subset of a heading is a heading... Every subset of a tuple is a tuple: e.g., SNO : CHAR CITY : CHARSNO : CHAR S1LondonS1 The empty set is a subset of every set... So the empty tuple (or 0-tuple) is a valid tuple! (and theres only one)... Type and value both TUPLE{} in Tutorial D Tuple assignment and comparisons: Already discussed

78 Copyright C. J. Date 2008page 77 ATTRIBUTE EXTRACTION : Note logical difference between value v and tuple t (of degree one) that contains just v !!! Let t be a tuplesay the tuple for supplier S1 in current value of suppliers-and-parts DB Tutorial D:CITY FROM t "extracts" CITY value from t SQL analog:t.CITY

79 Copyright C. J. Date 2008page 78 SQL ROWS : Tutorial D term:SQL analog (approx.): tuple (value)row TUPLE type generatorrow type constructor tuple selectorrow value constructor tuple variablerow variable (?) But SQL rows have left to right ordering to their "fields"... e.g., ROW(1,2) ROW(2,1) *... Fields identified by ordinal position, not by name No "0-row" *Keyword ROW optional in row value constructors and usually omitted

80 Copyright C. J. Date 2008page 79 ROW ASSIGNMENT : SET syntax (as for scalars) /* already discussed */ Row assignments also involved (in effect) in UPDATE: e.g., UPDATES SET STATUS = 20, CITY = London WHERECITY = Paris ; Logically equivalent to: UPDATES SET (STATUS, CITY ) = ( 20, London ) WHERE CITY = Paris ;

81 Copyright C. J. Date 2008page 80 ROW COMPARISONS : Believe it or not, most boolean exps in SQL, even simple "scalar" comparisons, are defined in terms of rows, not scalars! Example involving "genuine" row comparison: SELECTSNO FROM S WHERE( STATUS, CITY ) = ( 20, London ) Logically equivalent to: SELECTSNO FROM S WHERESTATUS = 20 AND CITY = London

82 Copyright C. J. Date 2008page 81 SELECTSNO FROM S WHERE( STATUS, CITY ) <> ( 20, London ) Logically equivalent to: SELECTSNO FROM S WHERESTATUS <> 20 OR CITY <> London

83 Copyright C. J. Date 2008page 82 Because row components have left to right ordering, SQL can support " " on rows: SELECTSNO FROM S WHERE( STATUS, CITY ) > ( 20, London ) Logically equivalent to: SELECTSNO FROM S WHERESTATUS > 20 OR (STATUS = 20 AND CITY > London ) /* hmmm... */

84 Copyright C. J. Date 2008page 83 But most row comparisons involve rows of degree one: SELECTSNO FROM S WHERE( STATUS ) = ( 20 ) Syntax rule:Parens can be dropped from row value constructors of degree one... Thus: SELECTSNO FROM S WHERESTATUS = 20 But this "scalar" comparison is stil technically a row comparison (scalar comparands coerced to rows)

85 Copyright C. J. Date 2008page 84 RECOMMENDATION : Unless the rows being compared are of degree one (i.e., effectively scalars): Dont use " ", and ">=" comparisons They rely on left to right column ordering No straightforward relational counterpart Error prone In this connection... its worth noting that the SQL standardizers took several iterations to get the semantics right!

86 Copyright C. J. Date 2008page 85 A SAMPLE RELATION VALUE (relation for short) : SNO:CHAR SNAME : CHAR STATUS : INTEGER CITY : CHAR S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens Heading : { SNO CHAR, SNAME CHAR, STATUS INTEGER, CITY CHAR } /* tuple heading as previously defined */ /* … same attributes and same degree */ Type : RELATION { SNO CHAR, SNAME CHAR, STATUS INTEGER, CITY CHAR } Body : { tuples all with specified heading } Cardinality : cardinality of body

87 Copyright C. J. Date 2008page 86 NOTE THAT : "Relations contain tuples" only indirectly true! By definition: No relation contains duplicate tuplesincluding results of relational operators No top to bottom ordering to tuples, no left to right ordering to attributes Every tuple of every relation contains exactly one value, of approp type, for each attributei.e., relations are always normalized No nulls!

88 Copyright C. J. Date 2008page 87 Every subset of a body is a body (loosely, every subset of a rel is a relempty subset included ("empty relation") Given rel type RT, theres exactly one empty rel of type RT Tuple extraction: Already discussed t r : TRUE iff t appears in r... SQL example: SELECTSNO, SNAME, STATUS, CITY FROMS WHERESNO IN /* SNO coerced to ROW(SNO) */ (SELECT SNO FROM SP )

89 Copyright C. J. Date 2008page 88 ANOTHER POINT : RELATIONS ARE n-DIMENSIONAL (massive confusion on this simple point!) … A couple of quotes: 1. "When youre well trained in relational modeling, you begin to believe the world is two-dimensional … You think you can get anything into the rows and columns of a table" Douglas Barry, Executive Director, ODMG 2. "There is simply no way to mask the complexities involved in assembling two-dimensional data into a multi-dimensional form"Richard Finkelstein

90 Copyright C. J. Date 2008page 89 But a relation with n attributes (i.e., of degree n) represents points in n-dimensional space... Its n-dimensional, not 2-dimensional !!! Of course a relation looks flat when pictured in tabular form on paper … but a picture of a thing isnt the thing itself !!! A major logical difference here, in fact! Lets all vow never to say "flat relations" ever again

91 Copyright C. J. Date 2008page 90

92 Copyright C. J. Date 2008page 91 RELATIONAL COMPARISONS : Must be able to test rels for equality, of course: e.g., S { CITY } = P { CITY } /* FALSE */ Other useful comparison ops: Useful shorthands: IS_EMPTY ( r ) IS_NOT_EMPTY ( r )

93 Copyright C. J. Date 2008page 92 RELATIONS OF DEGREE ZERO : Empty heading is a valid heading... So a relation can be of degree zero! Type is RELATION{} in Tutorial D (Such rels are a little hard to draw) Can a relation with no attributes have any tuples? Yes, it can have AT MOST ONE TUPLE (the 0-tuple) One tuple:TABLE_DEE /* DEE for short */ No tuples:TABLE_DUM/* DUM for short */ Fundamentally important! (perhaps surprisingly) But not supported in SQL...

94 Copyright C. J. Date 2008page 93 WHY ARE THEY SO IMPORTANT ? Because DEE corresponds to YES(or TRUE) and DUM corresponds to NO (or FALSE) !!! /* see later for further explanation */ Also... DEE and DUM (especially DEE) play a role in the relational algebra analogous to the role played by 0 in conventional arithmetic /* again, see later for further explanation */

95 Copyright C. J. Date 2008page 94 SQL TABLES : I.e., table values, unless context demands otherwise /* see later re table variables */ SQL has no "table type" notion... An SQL table is just a bag of rows of some row type... Hence, no "TABLE type generator" (though SQL does support ROW, ARRAY, MULTISET type generators) But SQL table value constructor is analogous (somewhat) to a relation selector. E.g. /* "table literal" */ VALUES ( 1, 2 ), ( 2, 1 ), ( 1,1 ), ( 1,2 ) Denotes table with 2 unnamed columns and 4 (not 3!) rows

96 Copyright C. J. Date 2008page 95 ANOTHER EXAMPLE : VALUES(S1, Smith, 20,London ), (S2, Jones, 10,Paris ), (S3, Blake, 30,Paris ), (S4, Clark, 20,London ), (S5, Adams, 30,Athens ) Recommendations: 1.For each column, ensure all values are of the same type 2.Dont specify same row twice

97 Copyright C. J. Date 2008page 96 TABLE COMPARISONS ??? No direct support, but workarounds are available... E.g., SQL analog of S { CITY } = P { CITY } is: NOT EXISTS(SELECT CITY FROM S EXCEPT SELECT CITY FROM P ) AND NOT EXISTS(SELECT CITY FROM P EXCEPT SELECT CITY FROM S )

98 Copyright C. J. Date 2008page 97 COLUMN NAMING (very important!) : RM attribute naming discipline: No anonymous attributes No duplicate attribute names SQL enforces analogous discipline for tables that are current values of table variables (CREATE TABLE or CREATE VIEW) but not for tables resulting from evaluation of some table expression Very strong recommendation: /* Why? See later */ Use AS to enforce discipline if SQL doesnt!* * But you cant, with VALUES expressions

99 Copyright C. J. Date 2008page 98 EXAMPLES : SELECTDISTINCT SNAME, Supplier AS TAG FROMS SELECTDISTINCT SNAME, 2 * STATUS AS DOUBLE_STATUS FROMS CREATE VIEW SDS ASSELECT DISTINCT SNAME, 2 * STATUS AS DOUBLE_STATUS FROM S ; SELECTDISTINCT S.CITY AS SCITY, P.CITY AS PCITY FROMS, SP, P WHERES.SNO = SP.SNO ANDSP.PNO = P.PNO

100 Copyright C. J. Date 2008page 99 SELECTTEMP.* FROM (S JOIN P ON S.CITY > P.CITY ) AS TEMP (SNO, SNAME, STATUS, SCITY, PNO, PNAME, COLOR, WEIGHT, PCITY ) SELECTMAX ( WEIGHT ) AS MBW FROMP WHERECOLOR = Blue Note:Can ignore recommendation if no need to reference column subsequently: e.g., SELECT... WHERE WEIGHT < ( SELECT MAX ( WEIGHT ) FROM P WHERE P.COLOR = Blue )

101 Copyright C. J. Date 2008page 100 WHY IS COLUMN NAMING IMPORTANT ??? Rel alg ops (e.g., UNION) rely on proper attrib naming One reason:Avoids complexities caused by relying on ordinal position! To use SQL relationally, must apply same discipline to SQL analogs... As a prereq: Very strong recommendation: If two columns represent "the same kind of information," give them the same name wherever possible! E.g., SNO and SNO, not (say) SNO and SNUM If two columns represent different kinds of information, give them different names (usually)

102 Copyright C. J. Date 2008page 101 Only situation where foregoing recommendation cant be followed = when two columns in same table represent same kind of information... E.g.: CREATE TABLE EMP ( ENO..., MNO...,... ) ; So column renaming sometimes necessary: e.g., ( SELECT ENO, MNO FROM EMP ) AS TEMP1 NATURAL JOIN ( SELECT ENO AS MNO,... FROM EMP ) AS TEMP2 /* join EMP to itself on MNO in "1st copy"*/ /* and ENO in "2nd copy"*/

103 Copyright C. J. Date 2008page 102 But what if DB already violates naming discipline? Possible strategy: For each base table T, define view V identical to T except for column renaming Ensure V abides by column naming discipline Operate in terms of V instead of T Referred to subsequently as the "operate via views strategy"

104 Copyright C. J. Date 2008page 103 Impossible to ignore ordinal position 100 percent... Columns still have ordinal position even when they dont need to (in base tables and views in particular) Strong recommendation:Never write SQL code that relies on ordinal position! Contexts in which SQL attaches significance to ordinal position: SELECT * JOIN, UNION, INTERSECT, EXCEPT VALUES INSERT if column name commalist omitted ALL and ANY Column name commalist in CREATE VIEW comparisonsand range variable definitions BUT...

105 Copyright C. J. Date 2008page Setting the scene 8.SQL and constraints 2. Types and domains 9.SQL and views 3.Tuples and relations, 10.SQL and logic I: rows and tablesRelational calculus 4.No duplicates, no nulls11.SQL and logic II: Using logic to write SQL 5.Base relvars, base tables 12.Further SQL topics 6.SQL and algebra I: The original operators13.Appendix: The relational model 7.SQL and algebra II: Additional operators14.Appendix: DB design STRUCTURE OF PRESENTATION :

106 Copyright C. J. Date 2008page 105 WHY DUPLICATE ROWS ARE BAD NEWS : I assume you know: Relational DBMSs include an optimizer... Purpose is to figure out the best way to implement user queries etc. ("best" = best performing) Optimizers transform relational expressions ("query rewrite")*... Replace exp1 by exp2, where exp1 and exp2 guaranteed to produce same result when evaluated but exp2 has better performance (we hope) * But watch out for this term (has other meanings too)

107 Copyright C. J. Date 2008page 106 DUPLICATE ROWS (cont.) : If a table permits duplicates, ITS NOT A RELATION RM doesnt recognize duplicates Example (with acknowledgments to Nat Goodman): P PNO PNAME SP SNO PNO P1 Screw S1 P1 P1 Screw S1 P2 P2 Screw No CKs !!! Violate the Information Principle !!! Meanings hidden !!! Find part nos. for parts that either are screws or are supplied by supplier S1

108 Copyright C. J. Date 2008page 107 SELECT P.PNO FROM P WHERE P.PNAME = Screw OR P.PNO IN UNION ALL (SELECT SP.PNO FROM SP SELECT SP.PNO FROM SP WHERE SP.SNO = S1) WHERE SP.SNO = S1 SELECT SP.PNO FROM SP SELECT DISTINCT P.PNO FROM P WHERE SP.SNO = S1 WHERE P.PNAME = Screw OR SP.PNO IN UNION ALL (SELECT P.PNO FROM P SELECT SP.PNO FROM SP WHERE P.PNAME = Screw)WHERE SP.SNO = S1 SELECT P.PNO FROM P, SP SELECT P.PNO FROM P WHERE (SP.SNO = S1 AND WHERE P.PNAME = Screw P.PNO = SP.PNO) UNION ALL OR P.PNAME = Screw SELECT DISTINCT SP.PNO FROM SP WHERE SP.SNO = S1 SELECT SP.PNO FROM P, SP SELECT P.PNO FROM P WHERE (SP.SNO = S1 AND WHERE P.PNAME = Screw P.PNO = SP.PNO) UNION OR P.PNAME = Screw SELECT SP.PNO FROM SP WHERE SP.SNO = S1 DUPLICATE ROWS (cont.) :

109 Copyright C. J. Date 2008page 108 SELECT P.PNO FROM P P1*3 P2*1SELECT P.PNO FROM PP1*5 P2*2 WHERE P.PNAME = Screw OR P.PNO IN UNION ALL (SELECT SP.PNO FROM SP SELECT SP.PNO FROM SP WHERE SP.SNO = S1) WHERE SP.SNO = S1 SELECT SP.PNO FROM SP P1*2 P2*1 SELECT DISTINCT P.PNO FROM PP1*3 P2*2 WHERE SP.SNO = S1 WHERE P.PNAME = Screw OR SP.PNO IN UNION ALL (SELECT P.PNO FROM P SELECT SP.PNO FROM SP WHERE P.PNAME = Screw WHERE SP.SNO = S1 SELECT P.PNO FROM P, SP P1*9 P2*3 SELECT P.PNO FROM P P1*4 P2*2 WHERE (SP.SNO = S1 AND WHERE P.PNAME = Screw P.PNO = SP.PNO) UNION ALL OR P.PNAME = Screw SELECT DISTINCT SP.PNO FROM SP WHERE SP.SNO = S1 SELECT SP.PNO FROM P, SPP1*8 P2*4 SELECT P.PNO FROM P P1*1 P2*1 WHERE (SP.SNO = S1 AND WHERE P.PNAME = Screw P.PNO = SP.PNO) UNION OR P.PNAME = Screw SELECT SP.PNO FROM SP WHERE SP.SNO = S1 DUPLICATE ROWS (cont.) :

110 Copyright C. J. Date 2008page 109 Either (a) the user cares about the degree of duplication, or (b) the user does not care… Expression transformation is inhibited! Performance suffers DBMS code quality suffers Law-abiding users suffer Particularly annoying if the user does NOT care !!! DUPLICATE ROWS (cont.) :

111 Copyright C. J. Date 2008page 110 If a table is a plot of points in some n-dimensional space, duplicates dont add anythingjust mean plotting the same point twice If table T permits duplicates, we cant distinguish "genuine" duplicates and duplicates arising from data entry errors! If something is true, saying it twice doesnt make it more true Much more could be said.... Please write out one googol times: Theres no such thing as a duplicate. Anon. DUPLICATE ROWS : FURTHER ISSUES

112 Copyright C. J. Date 2008page 111 RM prohibits duplicates... So to use SQL relationally, we must prevent them from occurring Base tables:Specify at least one key /* see later */ Derived tables: SELECT ALL / UNION ALL / VALUES can all produce dup rows... VALUES already discussed... Regarding ALL vs. DISTINCT: Can appear in SELECT / UNION / INTERSECT / EXCEPT / invocation of "set function" such as SUM /* this case is a little special... see later */ DISTINCT is default for UNION / INTERSECT / EXCEPT... ALL is default in other cases AVOIDING DUPLICATES IN SQL :

113 Copyright C. J. Date 2008page 112 Obvious recommendations: Always specify DISTINCT... preferably do so explicitly... and never specify ALL Unfortunately... /* quote ex book */ : At this point in the original draft, I added that if you find the discipline of always specifying DISTINCT annoying, dont complain to mecomplain to the SQL vendors instead. But my reviewers reacted with almost unanimous horror to my suggestion that you should always specify DISTINCT. One wrote: "Those who really know SQL well will be shocked at the thought of coding SELECT DISTINCT by default." Well, Id like to suggest, politely, that (a) those who are "shocked at the thought" probably know the implementations well, not SQL, and (b) their shock is probably due to their recognition that those implementations do such a poor job of optimizing away unnecessary DISTINCTs. SELECT / UNION / etc. :

114 Copyright C. J. Date 2008page 113 If I write SELECT DISTINCT SNO FROM S..., that DISTINCT can safely be ignored. If I write either EXISTS (SELECT DISTINCT...) or IN (SELECT DISTINCT...), those DISTINCTs can safely be ignored. If I write SELECT DISTINCT SNO FROM SP... GROUP BY SNO, that DISTINCT can safely be ignored. If I write SELECT DISTINCT... UNION SELECT DISTINCT..., those DISTINCTs can safely be ignored. And so on. Why should I, as a user, have to devote time and effort to figuring out whether some DISTINCT is going to be a performance hit and whether its logically safe to omit it?and to remembering all of the details of SQLs inconsistent rules for when duplicates are automatically eliminated and when theyre not?

115 Copyright C. J. Date 2008page 114 Well, I could go on. However, I decidedagainst my own better judgment, but in the interest of maintaining good relations (with my reviewers, I mean)not to follow my own advice elsewhere in this book but only to request duplicate elimination explicitly when it seemed to be logically necessary to do so. It wasnt always easy to decide when that was, either. But at least now I can add my voice to those complaining to the vendors, I suppose.

116 Copyright C. J. Date 2008page 115 Recommendations: Make sure you know when SQL eliminates duplicates without you asking it to When you do have to ask, make sure you know whether it matters if you dont When it does matter, specify DISTINCT /* but be annoyed about it */ And never specify ALL! SADLY, THEREFORE :

117 Copyright C. J. Date 2008page 116 WHY NULLS ARE BAD NEWS : I assume you know: Any comparison in which at least one comparand is null evaluates to UNKNOWN, not TRUE or FALSE Rationale: Null means "value unknown" … Hence three-valued logic (3VL) 3VL truth tables for NOT, AND, OR: NOT AND T U F OR T U F T F T T U F T T T T U U U U U F U T U U F T F F F F F T U F

118 Copyright C. J. Date 2008page 117 S SNO CITY P PNO CITY S1 London P1 Nothing at all in CITY slot for part P1 !!! Get SNO/PNO pairs where either the supplier and part cities are different or the part city isnt Paris (or both): SELECT DISTINCT S.SNO, P.PNO FROM S, P WHERE S.CITY <> P.CITY OR P.CITY <> Paris "null" NULLS (cont.) :

119 Copyright C. J. Date 2008page 118 Boolean expression in the WHERE clause: ( S.CITY <> P.CITY ) OR ( P.CITY <> Paris ) For the only data we have, this becomes ( S.CITY <> null ) OR ( null <> Paris ) UNKNOWN OR UNKNOWN UNKNOWN Nothing retrieved! NULLS (cont.) :

120 Copyright C. J. Date 2008page 119 But part P1 does have some corresponding city … i.e., the null does stand for some real value, say c Either c is Paris or it is not If it is, boolean expression becomes ( London <> Paris ) OR ( Paris <> Paris ) : TRUE If it is not, boolean expression becomes ( London <> c ) OR ( c <> Paris ) : TRUE because c is not Paris So TRUE is the right answer … hence, 3VL DOES NOT MATCH REALITY !!! (Showstopper !!!) NULLS (cont.) :

121 Copyright C. J. Date 2008page 120 SELECT PNO FROM P WHERE CITY = CITY Message: If you havenulls in your DB... youre getting wrong answers !!! Note:Foregoing arguments apply to nulls and 3VL in general... But SQL manages to introduce additional flaws of its own! In particular, SQL represents "the third truth value" by NULL, not UNKNOWN (even though it does support an UNKNOWN keyword)... Just as bad as representing zero by NULL !!! EVEN MORE TRIVIAL EXAMPLE :

122 Copyright C. J. Date 2008page 121 TO SUM UP : By definition, a null isnt a value … THEREFORE: A "type" that contains a null isnt a type A "tuple" that contains a null isnt a tuple A "relation" that contains a null isnt a relation In fact, nulls violate The Information Principle /* see later */ Which means the entire edifice crumbles, and ALL BETS ARE OFF !!! MUCH more that could be saidbut not here...

123 Copyright C. J. Date 2008page 122 AVOIDING NULLS IN SQL : RM prohibits nulls... So to use SQL relationally, we must prevent them from occurring Base tables: Specify NOT NULL for every column Derived tables: Many ops can produce nulls... "Set functions" such as SUM all return null if argument is empty (except for COUNT and COUNT(*), which correctly return zero) If scalar subquery evaluates to an empty table, that table is coerced to null

124 Copyright C. J. Date 2008page 123 If row subquery evaluates to an empty table, that table is coerced to a row of all nulls /* not a null row! */ Outer join, union join If ELSE omitted from CASE, ELSE NULL assumed If x = y, NULLIF(x,y) returns null ON DELETE SET NULL, ON UPDATE SET NULL

125 Copyright C. J. Date 2008page 124 STRONG RECOMMENDATIONS : Base tables: Specify NOT NULL for every column /* is this a duplicate recommendation? */ Dont use NULL keyword in any other context Dont use UNKNOWN keyword anywhere Dont omit ELSE from CASE Dont use NULLIF Dont use outer join except as noted below Dont use union join Dont specify PARTIAL or FULL on MATCH

126 Copyright C. J. Date 2008page 125 STRONG RECOMMENDATIONS (cont.) : Dont use MATCH on foreign key constraints Dont use IS DISTINCT FROM Dont use IS [NOT] TRUE or IS [NOT] FALSE Do use COALESCE on every exp that might otherwise "evaluate to null"... e.g.: SELECTS.SNO, (SELECTCOALESCE ( SUM ( ALL QTY ), 0 ) FROMSP/* this ALL is OK! */ WHERE SP.SNO = S.SNO ) AS TOTQ FROMS

127 Copyright C. J. Date 2008page 126 A REMARK ON OUTER JOIN : Should generally be avoided (shotgun marriage): Forces tables into a kind of union [sic!] even when they fail to conform to requirements for union /* see later */ by, in effect, padding with nulls before doing the union But why not pad with proper values? SELECT SNO, PNOSNOPNO FROM SP UNION S1 P1 SELECTSNO, nil AS PNO S1 P2 FROM S S1 P3 WHERESNO NOT IN.... (SELECT SNO FROM SP ) S5 nil

128 Copyright C. J. Date 2008page 127 A REMARK ON OUTER JOIN (cont.) : Could achieve same result via disciplined (clean) use of explicit outer join plus COALESCE: SELECT SNO, COALESCE ( PNO, nil ) AS PNO FROM (S NATURAL LEFT OUTER JOIN SP ) AS POINTLESS /* re that POINTLESS... dont even ask (yet?) */

129 Copyright C. J. Date 2008page Setting the scene 8.SQL and constraints 2. Types and domains 9.SQL and views 3.Tuples and relations, 10.SQL and logic I: rows and tablesRelational calculus 4.No duplicates, no nulls11.SQL and logic II: Using logic to write SQL 5.Base relvars, base tables 12.Further SQL topics 6.SQL and algebra I: The original operators13.Appendix: The relational model 7.SQL and algebra II: Additional operators14.Appendix: DB design STRUCTURE OF PRESENTATION :

130 Copyright C. J. Date 2008page 129 BASE RELVARS, BASE TABLES : Assume for simplicity until further notice that: All relvars are base relvars All table variables are base table variables Special considerations* that apply to other kinds of relvars / other kinds of table variablesto views in particularwill be covered later * Such as they are

131 Copyright C. J. Date 2008page 130 DATA DEFINITIONS : VARS BASERELATIONCREATE TABLE S {SNOCHAR, (SNOVARCHAR(5)NOT NULL, SNAMECHAR,SNAMEVARCHAR(25)NOT NULL, STATUSINTEGER,STATUSINTEGERNOT NULL, CITYCHAR }CITYVARCHAR(20)NOT NULL, KEY { SNO } ;UNIQUE( SNO ) ) ; VARP BASERELATIONCREATE TABLE P {PNOCHAR, (PNOVARCHAR(6)NOT NULL, PNAMECHAR,PNAMEVARCHAR(25)NOT NULL, COLORCHAR,COLORCHAR(10)NOT NULL, WEIGHTFIXED,WEIGHTNUMERIC(5,1)NOT NULL, CITYCHAR }CITYVARCHAR(20)NOT NULL, KEY { PNO } ;UNIQUE( PNO ) ) ;

132 Copyright C. J. Date 2008page 131 VARSP BASERELATIONCREATE TABLE SP {SNOCHAR, (SNOVARCHAR(5)NOT NULL, PNOCHAR,PNOVARCHAR(6)NOT NULL, QTYINTEGER }QTYINTEGERNOT NULL, KEY { SNO, PNO } UNIQUE( SNO, PNO ), FOREIGN KEY { SNO }FOREIGN KEY ( SNO ) REFERENCES S REFERENCES S ( SNO ), FOREIGN KEY { PNO }FOREIGN KEY ( PNO ) REFERENCES P ; REFERENCES P ( PNO ) ) ;

133 Copyright C. J. Date 2008page 132 l INSERT inserts a set of tuples / DELETE deletes a set of tuples / UPDATE updates a set of tuples l Thus, e.g., "UPDATE tuple t" really means "update a set of tuples that happens to be of cardinality one"... l... and isnt always possible! Suppose suppliers S1 and S4 must be in the same city (integrity constraint for relvar S) Then updating, e.g., just the city for S1 must fail Instead (e.g.): UPDATING IS SET LEVEL /* actually ALL rel ops are set level */ :

134 Copyright C. J. Date 2008page 133 UPDATE S WHERESNO = S1 SETCITY = New York ORSNO = S4 :WHERESNO = S1 {CITY := New York } ;ORSNO = S4 ; Implications: (a) Integrity checking and triggered actions mustnt be done till all updating has been done (set level op is not a sequence of tuple level ops) /* more on integrity later */... (b) UPDATE / DELETE via cursor make no sense! Recommendation: Avoid row level ops (cursor updates in particular) unless you know integrity problems wont occur

135 Copyright C. J. Date 2008page 134 Tuples are values and CAN'T be updated! "Updating a set of tuples" really means replacing one set of tuples by another... R := ( R MINUS old ) UNION new ; where old and new are relations (of same type as R) containing the old and new tuples, respectively Likewise: "Updating attribute A within tuple t" is also sloppythough useful!shorthand WHATS MORE :

136 Copyright C. J. Date 2008page 135 R := rx ; /* generic form */ "INSERT R rx ;" shorthand for: R := R D_UNION rx ; "disjoint union" "DELETE R WHERE bx ;" shorthand for: R := R WHERE NOT ( bx ) ; "UPDATE R WHERE bx : {... } ;" shorthand for: /* see later */attribute assignment commalist RELATIONAL ASSIGNMENT :

137 Copyright C. J. Date 2008page 136 INSERT / DELETE / UPDATE directly analogous to Tutorial D counterparts... Two points on INSERT: INSERT INTO T [ ( column name commalist ) ] tx ; 1.tx often but not always a VALUES exp... INSERT really does insert a set of rows /* not true historically! */ 2.Recommendation: State column names explicitly. E.g.: INSERT INTO SP ( PNO, SNO, QTY ) /* good */ VALUES ( P6, S4, 700 ), ( P6, S5, 250 ) ; INSERT INTO SP /* badrelies on column ordering */ VALUES ( S4, P6, 700 ), ( S5, P6, 250 ) ; UPDATING IN SQL :

138 Copyright C. J. Date 2008page 137 No SQL counterpart to relational assignment as such... Best approximation: R := rx ;DELETE FROM T ; INSERT INTO T (... ) tx ; SQL could fail where Tutorial D succeeds The Assignment Principle: After assignment of v to V, v = V must give TRUE Very simple... but far reaching consequences!

139 Copyright C. J. Date 2008page 138 Let K be a subset of the heading of relvar R. Then K is a candidate key (or just key) for R iff: 1. Uniqueness: No possible value of R has two distinct tuples with the same value for K 2. Irreducibility: No proper subset of K has the uniqueness property E.g., {SNO}, {PNO}, {SNO,PNO} for relvars S, P, SP, resp. EVERY RELVAR HAS AT LEAST ONE CANDIDATE KEY (why?) :

140 Copyright C. J. Date 2008page 139 Strong recommendation:Every CREATE TABLE should have at least one UNIQUE and/or PRIMARY KEY specification Note:We dont insist on primary keys as such, but do usually follow PK discipline ourselves (marked by double underlining) Key values are tuples! Key uniqueness relies on tuple equality!... Number of attributes is degree of key Keys apply to relvars, not relations (why?) Note:System can enforce uniqueness but cant enforce irreducibility POINTS ARISING :

141 Copyright C. J. Date 2008page 140 Why irreducibility? Because if system knows only that, e.g., {SNO,CITY} values have uniqueness property, it will be enforcing the WRONG INTEGRITY CONSTRAINT Recommendation:Never lie to the DBMS! A subset SK of the heading of R thats unique but not necessarily irreducible is a superkey Uniqueness of SK implies that the functional dependence /* see later */ SK A is satisfied by R for all subsets A of the heading of R i.e., ALWAYS have "arrows out of superkeys"

142 Copyright C. J. Date 2008page 141 VAR TAX_BRACKET BASE RELATION { LOW MONEY, HIGH MONEY, PERCENTAGE INTEGER } KEY { LOW } KEY { HIGH } KEY { PERCENTAGE } ; VAR ROSTER BASE RELATION { DAY DAY, HOUR HOUR, GATE GATE, PILOT NAME } KEY { DAY, HOUR, GATE } KEY { DAY, HOUR, PILOT } ; VAR MARRIAGE BASE RELATION { SPOUSE_A NAME, SPOUSE_B NAME, DATE_OF_MARRIAGE DATE } KEY { SPOUSE_A, DATE_OF_MARRIAGE } KEY { DATE_OF_MARRIAGE, SPOUSE_B } KEY { SPOUSE_B, SPOUSE_A } ; RELVARS CAN HAVE N KEYS (N > 1) :

143 Copyright C. J. Date 2008page 142 Let R1 and R2 be relvars, not necessarily distinct, and let K be a key for R1 Let FK be a subset of the heading of R2 such that there exists a possibly empty sequence of attribute renamings on R1 that maps K into K (say), where K and FK contain exactly the same attributes Let R2 and R1 be subject to the constraint that, at all times, every tuple t2 in R2 has an FK value thats the K value for some (necessarily unique) tuple t1 in R1 at the time in question Then FK is a foreign key (with the same degree as K); the associated constraint is a referential constraint; and R2 and R1 are the referencing relvar and the corresponding referenced relvar, respectively, for that constraint SOME RELVARS HAVE FOREIGN KEYS :

144 Copyright C. J. Date 2008page 143 E.g., {SNO} and {PNO} in relvar SP Referential integrity rule: DB must never contain any unmatched FK values Note reliance on tuple equality again... Another example: VAREMP BASE RELATIONCREATE TABLE EMP {ENOCHAR, (ENO VARCHAR(6)NOT NULL, MNO CHAR,MNO VARCHAR(6)NOT NULL,... }....., KEY { ENO } UNIQUE ( ENO ), FOREIGN KEY { MNO }FOREIGN KEY ( MNO ) REFERENCES EMP { ENO }REFERENCES EMP ( ENO ) ) ; RENAME ( ENO AS MNO ) ;

145 Copyright C. J. Date 2008page 144 Column matching in SQL done by ordinal position, not by name, so renaming not nec... though corresp columns must be of same type (no coercion) Recommendation: Nevertheless, ensure that corresp columns do have the same name if possible Cant follow this recommendation if either: Table T has FK matching key of T itself (as in EMP) Table T2 has two distinct FKs both matching same key in table T1 (as in bill of materials) So do the best you can...

146 Copyright C. J. Date 2008page 145 REFERENTIAL ACTIONS /* e.g., cascade delete */ : Not part of RM as such... Supported by SQL but not by Tutorial D /* yet */ RM = foundation of the DB field, but only the foundation... Nothing wrong with additional features, so long as they dont violate RM and are in spirit of RM and are useful: Type theory Recovery and concurrency (?) Triggered procedures... Referential actions a special case, though specified declaratively... OK so long as set level not row level (?)... OK so long as they dont violate The Assignment Principle (but they usually do)

147 Copyright C. J. Date 2008page 146 Heading corresponds to a predicate (truth valued function): e.g., Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in CITY Parameters (SNO, SNAME, STATUS, CITY in the example) stand for values of the relevant types Tuples represent true propositions ("instantiations" of the predicate that evaluate to TRUE), obtained by substituting arguments for the parameters: e.g., Supplier S1 is under contract, is named Smith, has status 20, and is located in London (Very important!) WAY OF THINKING ABOUT RELVARS :

148 Copyright C. J. Date 2008page 147 THUS : Every relvar has associated relvar predicate (or meaning or intended interpretation or intension) If relvar R has predicate P, then every tuple t in R at time x represents proposition p, derived by invoking (or instantiating) P at time x with ts attrib values as arguments Body of R at time x is extension of P at time x The Closed World Assumption: Relvar R contains, at any given time, all and only the tuples that represent true propositions (true instantiations of the predicate for R) at the time in question Loosely: Everything the DB says (or implies) is true, everything else is false

149 Copyright C. J. Date 2008page 148 TYPES are sets of things we can talk about; RELATIONS are (true) statements about those things! Note three very important corollaries... RELATIONS vs. TYPES :

150 Copyright C. J. Date 2008page Types and relations are both NECESSARY 2. They're not the same thing (logical difference!) 3. They're SUFFICIENT (as well as necessary)* A DB (with ops) is a logical system! This was Codds great insight... and its why RM is rock solid, and "right," and will endure... and why other "data models" are just not in the same ballpark * Need relvars too for changes over time

151 Copyright C. J. Date 2008page 150 TYPES are to RELATIONS as NOUNS are to SENTENCES A NICE ANALOGY :

152 Copyright C. J. Date 2008page Setting the scene 8.SQL and constraints 2. Types and domains 9.SQL and views 3.Tuples and relations, 10.SQL and logic I: rows and tablesRelational calculus 4.No duplicates, no nulls11.SQL and logic II: Using logic to write SQL 5.Base relvars, base tables 12.Further SQL topics 6.SQL and algebra I: The original operators13.Appendix: The relational model 7.SQL and algebra II: Additional operators14.Appendix: DB design STRUCTURE OF PRESENTATION :

153 Copyright C. J. Date 2008page 152 SOME PRELIMINARIES : Reminder re closure and nested exps Ops are generic and read-only But exps (op invocations) can include relvar refs: e.g., R1 UNION R2 /* R1 and R2 are relvar names */ Relvar ref is itself a rel exp* (op is "return value of") INSERT / DELETE / UPDATE / relational assignment are rel ops but not rel algebra ops: Caveat lector! * Not in SQL, though!e.g., T1 UNION T2 illegal and so is T (must say, e.g., SELECT * FROM T)

154 Copyright C. J. Date 2008page 153 Tutorial D vs. SQL : Overriding point = when correspondence needs to be established between operand attributes (as in JOIN): Tutorial D requires corresponding attributes to be, formally, the very same attribute... E.g.: P JOIN S /* join P and S "on CITY" */ SQL uses different techniques in different contexts: ordinal position, explicit specification, same name (not always same type)... E.g.: SELECTP.PNO, P.PNAME, P.COLOR, P.WEIGHT, P.CITY /* or S.CITY */, S.SNO, S.SNAME, S.STATUS FROMP, S WHERE P.CITY = S.CITY /* explicit specification */

155 Copyright C. J. Date 2008page 154 OR : SELECTP.PNO, P.PNAME, P.COLOR, P.WEIGHT, P.CITY /* or S.CITY */, S.SNO, S.SNAME, S.STATUS FROMP JOIN S ON P.CITY = S.CITY SELECTP.PNO, P.PNAME, P.COLOR, P.WEIGHT, CITY, S.SNO, S.SNAME, S.STATUS FROMP JOIN Snot P.CITY USING ( CITY )or S.CITY! SELECTP.PNO, P.PNAME, P.COLOR, P.WEIGHT, CITY, S.SNO, S.SNAME, S.STATUS FROMP NATURAL JOIN S

156 Copyright C. J. Date 2008page 155 POINTS ARISING : SQL permits, and sometimes requires, dot qualified names; Tutorial D doesnt Tutorial D sometimes needs to rename attributes to avoid naming clashes or mismatches; SQL usually doesnt (though it does support column renaming for other reasons) Tutorial D has no need for "correlation names" /* see later */ SQL supports features of rel calculus as well as features of rel algebra; Tutorial D doesnt /* see later */ SQL requires most queries to conform to SELECT - FROM - WHERE template; Tutorial D has nothing analogous /* see later */

157 Copyright C. J. Date 2008page 156 MORE ON CLOSURE : Result of every rel op is a relation... Any op that produces a result thats not a rel isnt a rel op!* E.g., in SQL, any op that produces a result with: Duplicate rows Anonymous columns Nulls Duplicate column names Left to right column ordering Strong recommendation: Dont use any op that violates closure if you want the result to be amenable to further relational processing * Except for relational inclusion (?)

158 Copyright C. J. Date 2008page 157 Closure doesnt mean intermediate results have to be materialized (popular misconception!)... E.g.: ( P JOIN S )SELECTP.*, SNO, SNAME, STATUS WHEREPNAME > SNAME FROMP, S WHEREP.CITY = S.CITY AND P.PNAME > S.SNAME Can pipeline join result to restriction op But another important point here: "PNAME > SNAME" applies to result of P JOIN S... so names PNAME and SNAME refer to attributes of that result !!!

159 Copyright C. J. Date 2008page 158 How do we know that result has such attributes? What is the heading of that result? More generally: Whats the heading for the result of any algebraic operation? Need relation type inference rules such that, given headings (and hence types) of input rels, we can infer heading (and hence type) of output rel RM includes such rules... E.g., P JOIN S is of type: RELATION { PNO CHAR, PNAME CHAR, COLOR CHAR, WEIGHT FIXED, CITY CHAR, SNO CHAR, SNAME CHAR, STATUS INTEGER } In fact need for such rules is implied by closure

160 Copyright C. J. Date 2008page 159 S RENAME ( CITY AS SCITY ) SELECT SNO, SNAME, STATUS, S.CITY AS SCITY FROM S Result identical to current value of S except for renaming SNO SNAME STATUS SCITY Note: Relvar S not changed S1 Smith 20 London in the DB! S2 Jones 10 Paris S3 Blake 30 Paris... not like S4 Clark 20 London ALTER TABLE in S5 Adams 30 Athens SQL Needed primarily as a preliminary to performing, e.g., UNION or JOIN /* see later */ RENAME :

161 Copyright C. J. Date 2008page 160 HOW DOES SQL HANDLE "TABLE TYPE" INFERENCE ??? Answer: Not very well! No proper notion of table type anyway Result can have anonymous columns Result can have duplicate column names Result has left to right column ordering) Strong recommendation: Use column renaming discipline described earlierwhich effectively relied on SQL-style column renaming (AS specifications)to ensure that SQL conforms as far as possible to relational rules (

162 Copyright C. J. Date 2008page 161 ( P JOIN S )SELECTP.*, SNO, SNAME, STATUS WHEREPNAME > SNAME FROMP, S WHEREP.CITY = S.CITY AND P.PNAME > S.SNAME P.PNAME > S.SNAME applies to result of join... ??? Actually quite difficult to explain this at all... The standard does explain it, but the machinations involved are much more complicated than RM type inference rules... Details beyond the scope of this seminar !!! In any case, youre supposed to know SQL, so you already know how this works (right?)... Or had you never thought about this issue before? EXAMPLE REVISITED : ANOTHER POINT

163 Copyright C. J. Date 2008page 162 THE ORIGINAL OPERATORS : restriction/* aka selection */ projection JOIN, TIMES theta join/* see later */ UNION, INTERSECT, MINUS DIVIDEBY/* see much later */

164 Copyright C. J. Date 2008page 163 RESTRICT : P WHERE WEIGHT < 12.5SELECTP.* FROM P boolean exp inWHERE WEIGHT < 12.5 which every attrib ref identifiesNote: WHERE in attrib of P and thereTutorial D is more are no relvar refsgeneral Result has same heading as P and body = tuples of P for which boolean exp evaluates to TRUE PNOPNAMECOLORWEIGHT CITY P1NutRed12.0London P5CamBlue12.0Paris

165 Copyright C. J. Date 2008page 164 PROJECT : P { COLOR, CITY }SELECTDISTINCT COLOR, CITY FROM P Result has heading as specified: COLORCITYNote: Duplicates eliminated! RedLondonTutorial D also supports GreenParisprojection on ALL BUT specified BlueOsloattribs... Similarly for other BlueParisops where it makes sense

166 Copyright C. J. Date 2008page 165 (Natural) JOIN : Rels r1 and r2 joinable iff attribs with same name are of same type (i.e., iff set theory union of headings is a legal heading) /* concept relevant to other ops as well as join */ P JOIN SSELECT P.*, SNO, SNAME, STATUS FROM P, S WHEREP.CITY = S.CITY Result heading = set theory union of headings of P and S... Result body = set of all tuples t where t is the set theory union of tuple from P and tuple from S PNOPNAMECOLORWEIGHTCITYSNOSNAMESTATUS P1NutRed12.0LondonS1Smith P6CogRed19.0LondonS4Clark20

167 Copyright C. J. Date 2008page 166 ALTERNATIVE SQL FORMULATION : SELECT * FROM P NATURAL JOIN S Result heading has columns CITY, PNO, PNAME, COLOR, WEIGHT, SNO, SNAME, STATUS in that order... but dont write code that relies on this ordering!

168 Copyright C. J. Date 2008page 167 POINTS ARISING : Let r1 and r2 be joinable Let common attributes (set theory intersection of headings) be {Y}... Let other attributes of r1 and r2 be {X} and {Z}, resp.... Join has heading = set theory union of {X}, {Y}, and {Z} If {X} and {Z} are empty, {Y} = entire heading of r1 and r2, and r1 JOIN r2 degenerates to r1 INTERSECT r2 E.g.: S { CITY } JOIN P { CITY } same as S { CITY } INTERSECT P { CITY }

169 Copyright C. J. Date 2008page 168 If {Y} is empty, r1 and r2 have no common attrib names, and r1 JOIN r2 degenerates to r1 TIMES r2 E.g.: S { ALL BUT CITY } JOIN P { ALL BUT CITY } same as S { ALL BUT CITY } TIMES P { ALL BUT CITY } Direct support for TIMES included for psychological reasons rather than logical ones (likewise for INTERSECT) Note: For TIMES, operand rels must have no common attrib names

170 Copyright C. J. Date 2008page 169 Can usefully define n-adic JOIN also (n > 0)* JOIN { r1, r2,..., rn } JOIN { r } r JOIN { } ??? Answer: TABLE_DEE !!! TABLE_DEE is the identity with respect to JOIN /* important! */ * Why exactly is this possible? See later...

171 Copyright C. J. Date 2008page t1 NATURAL JOIN t2/* already explained */ 2.t1 JOIN t2 ON bx 3.t1 JOIN t2 USING ( C1, C2,..., Cn ) 4.t1 CROSS JOIN t2 /* ( SELECT * FROM t1, t2 ) */ 2.t1 JOIN t2 ON bx... logically equivalent to: ( SELECT * FROM t1, t2 WHERE bx ) EXPLICIT JOINS IN SQL :

172 Copyright C. J. Date 2008page t1 JOIN t2 USING ( C1, C2,..., Cn ) equivalent to: ( SELECT * FROM t1, t2 WHERE t1.C1 = t2.C1 AND... AND t1.Cn = t2.Cn ) except that columns C1, C2,..., Cn appear only once in result, and result column ordering is: first C1, C2,..., Cn (in that order) then other columns of t1 (in same order as in t1), then other columns of t2 (in same order as in t2) /* Do you begin to see what a pain this left to right */ /* ordering business is ???*/ EXPLICIT JOINS IN SQL (cont.) :

173 Copyright C. J. Date 2008page NATURAL JOIN: First choice... Usually most succinct if other recommendations followed... But make sure columns with same name are of same type (joinability) 2.Avoid JOIN ON: Virtually guaranteed to produce duplicate column names (unless... ???)... If you must use it, do renaming as well 3.JOIN USING: Make sure columns with same name are of same type 4.CROSS JOIN: Make sure no common column names 5.WHERE (original syntax): As Case 2 (JOIN ON) RECOMMENDATIONS :

174 Copyright C. J. Date 2008page 173 Operands must be of same type, result is of same type also... Suppose parts have extra attribute STATUS, of type INTEGER: P { STATUS, CITY } UNIONSELECTSTATUS, CITY S { CITY, STATUS } FROMP UNIONCORRESPONDING SELECT CITY, STATUS FROMS Note: Duplicates eliminated!unless ALL specified, in SQL; result has attributes (columns) STATUS and CITYin that order, in SQL If CORRESPONDING not specified, column matching done on basis of ordinal position... Dont do this! UNION, INTERSECT, MINUS :

175 Copyright C. J. Date 2008page 174 P { STATUS, CITY } INTERSECTSELECTSTATUS, CITY S { CITY, STATUS } FROMP INTERSECTCORRESPONDING SELECT CITY, STATUS FROMS P { STATUS, CITY } MINUSSELECTSTATUS, CITY S { CITY, STATUS } FROMP EXCEPTCORRESPONDING SELECT CITY, STATUS FROMS UNION, INTERSECT, MINUS (cont.) :

176 Copyright C. J. Date 2008page 175 Make sure corresponding columns have same name and type Always specify CORRESPONDING if possible otherwise, make sure columns line up properly (because matching done by ordinal position): e.g., SELECTSTATUS, CITY FROM P UNION SELECTSTATUS, CITY FROM S /* note reordering */ Dont use "BY (column name commalist)" Never specify ALL! Note: Usual "justification" for ALL is performance... RECOMMENDATIONS :

177 Copyright C. J. Date 2008page 176 Tutorial D also supports: Disjoint union (D_UNION) /* see defn of INSERT earlier */ n-adic UNION, INTERSECT, D_UNION (n > 0) /* but not MINUS !!! */ ONE LAST POINT :

178 Copyright C. J. Date 2008page 177 Already seen that INTERSECT and TIMES can be defined in terms of join... i.e., not all ops primitive Difference between primitive and useful !!! One possible primitive set: restrict project join union difference But what about rename? WHICH OPERATORS ARE PRIMITIVE ???

179 Copyright C. J. Date 2008page 178 Get pairs of supplier numbers such that the suppliers are colocated (i.e., in same city): ( (( S RENAME( SNO AS SA ) ) { SA,CITY }JOIN ( S RENAME( SNO AS SB ) ){ SB,CITY } ) WHERE SA < SB ) { SA, SB } Or: WITH( S RENAME( SNO AS SA )){ SA, CITY }AS R1, ( S RENAME( SNO AS SB )){ SB, CITY }AS R2, R1 JOIN R2 AS R3, R3 WHERE SA < SB AS R4 : R4 {SA, SB } "WITH" SPECIFICATIONS /* very useful feature */ :

180 Copyright C. J. Date 2008page 179 Operands the other way around: WITH name AS exp No colon separator In Tutorial D, WITH can be used with exps of any kind; in SQL, WITH can be used with table exps only WITHT1AS(SELECT SNO AS SA, CITY FROM S ), T2AS(SELECT SNO AS SB, CITY FROM S ), T3AS(SELECT * FROM T1 NATURAL JOIN T2 ), T4AS(SELECT * FROM T3 WHERE SA < SB ) SELECTSA, SB FROM T4 "WITH" IN SQL :

181 Copyright C. J. Date 2008page 180 Recall:Every relvar has a relvar predicate (i.e., what the relvar means) This notion extends naturally to arbitrary rel exps! E.g., consider projection S {SNO,SNAME,STATUS}... Denotes rel containing all tuples of the form TUPLE { SNO sno, SNAME sn, STATUS st } such that a tuple of the form TUPLE { SNO sno, SNAME sn, STATUS st, CITY sc } currently exists in relvar S for some CITY value sc... In other words: WHAT DO RELATIONAL EXPRESSIONS MEAN?

182 Copyright C. J. Date 2008page 181 Specified exp denotes current extension of predicate: There exists some city CITY such that supplier SNO is under contract, is named SNAME, has status STATUS, and is located in city CITY Or just:Supplier SNO is under contract, is named SNAME, has status STATUS, and is located somewhere This predicate = meaning of S {SNO,SNAME,STATUS}... Has three parameters (relation has three attributes); CITY is a bound variable, not a param /* see later */ Pred for arb rel exp can be determined from preds for relvars involved plus semantics of rel ops involved

183 Copyright C. J. Date 2008page 182 THETA JOIN : E.g.: "unequal" join of S and P on cities /* SQL only */ : SELECTSNO, SNAME, STATUS, S.CITY AS SCITY, PNO, PNAME, COLOR, WEIGHT, P.CITY AS PCITY /* 3. "project"*/ FROM S, P/* 1. cartesian product*/ WHERE S.CITY <> P.CITY /* 2. restrict */ Note the conceptual algorithm for evaluating a SELECT - FROM - WHERE exp (i.e., formal definition of semantics of such exps) By the way: What if theta had been "=" ???

184 Copyright C. J. Date 2008page 183 Example:Suppliers who supply part P2, with corresp quantities (Tutorial D): ( ( S JOIN SP ) WHERE PNO = P2 ) { ALL BUT PNO } l DB : 100 suppliers, 100,000 shipments (500 for P2) l No optimization at all (worst case) : 1. Join 10,000,100 reads, 100,000 writes 2. Restrict (result 500 tuples) 100,000 reads, no writes 3. Project No reads, no writes TOTAL: 10,200,100 tuple I/Os EXPRESSION TRANSFORMATION : ("query rewrite") :

185 Copyright C. J. Date 2008page Restrict (result 500 tuples) 100,000 reads, no writes 2.Join (result 500 tuples) 100 reads, no writes 3.Project No reads, no writes TOTAL: 100,100 tuple I/Os (100 times better) AN OBVIOUS IMPROVEMENT :

186 Copyright C. J. Date 2008page 185 In effect, optimizer has transformed original exp into S JOIN ( SP WHERE PNO = P2 ) /* ignore projection */ Such transformations are one of the two great ideas at the heart of optimization Other = cost based optimizing: E.g., index or hash on SP.PNO will reduce 1,000,000 reads in Step 1 to 500, and overall procedure now 20,000 times better than the original But such optimizing has little to do with RM per se, except for strong logical vs. physical separation, which keeps access strategies out of applications

187 Copyright C. J. Date 2008page 186 THE DISTRIBUTIVE LAW : E.g., SQRT ( a * b ) SQRT ( a ) * SQRT ( b ) "SQRT distributes over multiplication" /* but not over addition */ In RM, restrict distributes over UNION / INTERSECT / MINUS... also JOIN if restriction condition = AND of two separate conditions, one for each join operand I.e.,( r1 WHERE bx1 ) JOIN ( r2 WHERE bx2 ) (( r1 JOIN r2 ) WHERE bx1 AND bx2 This law was used in the example Net effect: Can do restrictions early

188 Copyright C. J. Date 2008page 187 Project distributes over UNION I.e.,( r1 UNION r2 ) { X } r1 { X } UNION r2 { X } Also distributes over JOIN provided all joining attribs are included in the projection Can do projections early

189 Copyright C. J. Date 2008page 188 THE COMMUTATIVE LAW : Dyadic Op is commutative iff a Op b b Op a In arith,"+" and "*" are commutative, "-" and "/" arent In RM,UNION / INTERSECT / JOIN are commutative, MINUS isnt Hence, in (e.g.) r1 JOIN r2, system is free to choose, smaller of r1 and r2 (say) as "outer" rel and other as "inner" rel

190 Copyright C. J. Date 2008page 189 THE ASSOCIATIVE LAW : Dyadic Op is associative iff a Op (b Op c) ( a Op b) Op c In arith,"+" and "*" are associative, "-" and "/" arent In RM,UNION / INTERSECT / JOIN are associative, MINUS isnt Hence, in (e.g.) r1 JOIN r2 JOIN r3: No parens necessary System is free to choose join sequence

191 Copyright C. J. Date 2008page 190 THE IDEMPOTENCE AND ABSORPTION LAWS : Dyadic Op is idempotent iff a Op a a l In logic, AND and OR are idempotent l In RM, UNION / INTERSECT / JOIN are idempotent, MINUS isnt Absorption laws: r1 UNION ( r1 INTERSECT r2 ) r1 r1 INTERSECT ( r1 UNION r2 ) r1

192 Copyright C. J. Date 2008page 191 All such transformations can be done without regard for actual data values or access paths! Important note: Many such transformations available for sets... But fewer for bags... And fewer still if column ordinal position has to be taken into account... And far fewer if nulls and 3VL have to be taken into account... What do you conclude?

193 Copyright C. J. Date 2008page 192 E.g., P JOIN S... What if STATUS attribute added to P? Popular misconception! 1960s/1970s: pgmDBNot much data DB defindependence Today: pgmDBMore data DB defindependence (but...) BUT DOESNT RELYING ON ATTRIBUTE NAMES MAKE FOR FRAGILE CODE ???

194 Copyright C. J. Date 2008page 193 The right way: pgmDBFull data DB def DB def independence* Note:Views should have solved this problem but didnt... because mapping specified as part of the view definition instead of separately Recommendation: Adopt the "operate via views strategy"! * Full logical data independence, to be precise

195 Copyright C. J. Date 2008page Setting the scene 8.SQL and constraints 2. Types and domains 9.SQL and views 3.Tuples and relations, 10.SQL and logic I: rows and tablesRelational calculus 4.No duplicates, no nulls11.SQL and logic II: Using logic to write SQL 5.Base relvars, base tables 12.Further SQL topics 6.SQL and algebra I: The original operators13.Appendix: The relational model 7.SQL and algebra II: Additional operators14.Appendix: DB design STRUCTURE OF PRESENTATION :

196 Copyright C. J. Date 2008page 195 ADDITIONAL OPERATORS : MATCHING, NOT MATCHING EXTEND image relations DIVIDEBY aggregate operators SUMMARIZE GROUP, UNGROUP "what if" ORDER BY (?)

197 Copyright C. J. Date 2008page 196 SEMIJOIN AND SEMIDIFFERENCE : Most exps involving join or difference really require semijoin or semidifference r1 MATCHING r2 ( r1 JOIN r2 ) { H1 } where {H1} = heading of r1 S MATCHING SPSELECTS.* FROM S WHERESNO IN (SELECT SNO FROM SP ) r1 NOT MATCHING r2 r1 MINUS ( r1 MATCHING r2 ) S NOT MATCHING SPSELECTS.* FROM S WHERESNO NOT IN (SELECT SNO FROM SP ) If r1 and r2 of same type, r1 NOT MATCHING r2 degenerates to r1 MINUS r2 /* analogous remark NOT true of semijoin */

198 Copyright C. J. Date 2008page 197 EXTEND PSELECTP.*, ADD ( WEIGHT * 454WEIGHT * 454 AS GMWT AS GMWT ) FROMP PNO PNAME COLOR WEIGHT CITY GMWT Note: Relvar P not P1 Nut Red 12.0 London changed in P2 Bolt Green 17.0 Paris the DB! P3 Screw Blue 17.0 Oslo P4 Screw Red 14.0 London not like P5 Cam Blue 12.0 Paris ALTER TABLE P6 Cog Red 19.0 London in SQL EXTEND :

199 Copyright C. J. Date 2008page 198 Get PNO and gram weight for parts with gram weight > : ( ( EXTEND P ADD ( WEIGHT * 454 AS GMWT ) ) WHERE GMWT > ) { PNO, GMWT } Contrast SQL: SELECT PNO, ( WEIGHT * 454 ) AS GMWT FROM P WHERE ( WEIGHT * 454 ) > /* not GMWT > */ SELECT - FROM - WHERE template too rigid! (Lack of orthogonality)... Need to apply WHERE to SELECT result, not FROM result HENCE :

200 Copyright C. J. Date 2008page 199 Actually the standard does allow: SELECT TEMP PNO, TEMP.GMWT FROM ( SELECT P.PNO, ( WEIGHT * 454 ) AS GMWT FROM P ) AS TEMP WHERE TEMP.GMWT > But does your favorite product support subqueries in the FROM clause? Also, this style leads to references appearing (possibly a long way) before definitions...

201 Copyright C. J. Date 2008page 200 IMAGE RELATIONS : Image relation = "image" in some rel of some tuple (usually a tuple in some other rel) E.g., image in SP of tuple in S for S4: PNOQTY ( SP WHERE SNO = S4 ) { ALL BUT SNO } P2200 P4300 P5400 Very useful and widely applicable concept! So we define a shorthand...

202 Copyright C. J. Date 2008page 201 S WHERE ( !!SP ) { PNO } = P { PNO } image in SP of "current" tuple relational in S equality I.e., get suppliers who supply all parts! SNOSNAMESTATUSCITY S1Smith20London Image relation ref cant appear wherever rel exp is general can appear, only in contexts where pertinent tuple well defined (e.g., WHERE clause)

203 Copyright C. J. Date 2008page 202 SQL has no direct support for image rels... SQL analog of foregoing example: /* can be simplified */ SELECT* FROMS WHERENOT EXISTS (SELECT PNO FROM SP WHERE SP.SNO = S.SNO EXCEPT SELECTPNO FROMP ) AND NOT EXISTS (SELECT PNO FROMP EXCEPT SELECTPNO FROMP WHERESP.SNO = S.SNO )

204 Copyright C. J. Date 2008page 203 S{SNO }/* suppliers*/ SP{SNO, PNO }/* supplier supplies part*/ PJ{PNO, JNO }/* part is used in project*/ J{JNO }/* projects*/ Get all sno/jno pairs such that: SNO sno currently appears in S JNO jno currently appears in J Supplier sno supplies all parts used in project jno ( S JOIN J ) WHERE !!PJ !!SP Easy... but try it in SQL! ANOTHER EXAMPLE :

205 Copyright C. J. Date 2008page 204 Should be dropped, IMHO /* so can skip this topic if you like */ Any query that can be done via divide can be done better via image rels There are at least seven different divides! Doesnt solve the problem it was originally, and specifically, meant to address Original and simplest version: Let heading of r2 be subset of heading of r1 (so r1 and r2 definitely joinable, by the way) DIVIDEBY :

206 Copyright C. J. Date 2008page 205 r1 r2 XYY X DividendDivisorResult r1 DIVIDEBY r2 r1{ X } NOT MATCHING ( ( r1 { X } JOIN r2 ) NOT MATCHING r1 ) E.g., let RP be ( P WHERE COLOR = Red )... Then SP { SNO, PNO } DIVIDEBY RP { PNO } Loosely (?):SNOs for suppliers whoSNO supply all red parts... Probably needs to be joined to S (?)S1

207 Copyright C. J. Date 2008page 206 AGGREGATE OPERATORS /* digression (?) */ : In RM, agg op = op that derives a single value from the bag or set of values of some attribute of some relationor, for COUNT, from the entire rel. E.g.: X := COUNT ( S ) ;SELECTCOUNT ( * ) AS X /* X = 5 */FROM S Y := COUNT SELECTCOUNT ( DISTINCT STATUS ) ( S { STATUS } ) ;AS Y /* Y = 3 */FROM S Tutorial D syntax: ( [, ] )

208 Copyright C. J. Date 2008page 207 Tutorial D EXAMPLES : SUM( SP { QTY } )/* 1000 */ SUM( SP, QTY )/* 3100 */ AVG( SP, 3 * QTY )/* 775 */ Legal s include: COUNT SUM AVG MAX MIN AND OR XOR The can include s (in practice, almost always does) The must be omitted for COUNT... Otherwise, can be omitted only if rel denoted by is of degree one, as in first example above

209 Copyright C. J. Date 2008page 208 WHAT ABOUT SQL ??? SELECT COUNT ( * ) AS X FROM S SELECT COUNT ( DISTINCT STATUS ) AS Y FROM S SQL doesnt really support agg ops at all! Foregoing exps are summarizations, not aggregations; they dont evaluate to 5 and 3, resp.... instead, they evaluate to tables containing those counts: XY/* COUNT invocations are agg*/ /* op invocations, perhaps */ 53/*... but they cant appear */ /* as "stand alone" exps...*/ /* only inside table exps*/

210 Copyright C. J. Date 2008page 209 IN OTHER WORDS : Aggregation is treated in SQL as a special case of summarization (i.e., loosely, whats represented by a SELECT exp with a GROUP BY)... Note that the foregoing SELECT exps do have implicit GROUP BYs: SELECTCOUNT ( * ) AS X FROM S GROUP BY ( ) SELECT COUNT ( DISTINCT STATUS ) AS Y FROMS GROUP BY ( ) SQL "aggregation" is, loosely, a SELECT exp without an explicit GROUP BY

211 Copyright C. J. Date 2008page 210 Aggregation and summarization are often confused!... Perhaps you can begin to see why Picture confused still further because SQL often coerces table resulting from an "aggregation" to the single row it contains, or even doubly coerces it to the single value that row contains, as here: SET X= ( SELECT COUNT ( * ) FROM S ) ; SET Y= ( SELECT COUNT ( DISTINCT STATUS ) FROM S ) ; Another oddity: Logical error in connection with SQL-style aggregation and empty tables (I dont mean the nulls problem)... Details beyond the scope of this seminar

212 Copyright C. J. Date 2008page 211 BACK TO Tutorial D : Image rels can be very useful in connection with agg ops... e.g.: Suppliers for whom total shipment quantity, taken over all shipments, is less than 1000 S WHERE SUM ( !!SP, QTY ) < 1000 SQL "analog" (but note the trap!): SELECTS.SNO, S.SNAME, S.STATUS, S.CITY FROMS, SP WHERE S.SNO = SP.SNO GROUPBY S.SNO, S.SNAME, S.STATUS, S.CITY HAVINGSUM ( SP.QTY ) < 1000

213 Copyright C. J. Date 2008page 212 Suppliers with fewer than three shipments: S WHERE COUNT ( !!SP ) < 3 Suppliers where maximum shipment quantity < twice minimum shipment quantity: S WHERE MAX ( !!SP, QTY ) < 2 * MIN ( !!SP, QTY ) Update suppliers where total shipment quantity < 1000, halving their status: UPDATE S WHERE SUM ( !!SP, QTY ) < 1000 : { STATUS := 0.5 * STATUS } ;

214 Copyright C. J. Date 2008page 213 SUMMARIZE SP PER ( S { SNO } ) ADD ( COUNT ( PNO ) AS PCT ) /* Tutorial D (see later for SQL analog)... */ /* call this "SX1" for subsequent reference */ SNO PCT S1 6 S2 2 S3 1 S4 3 S5 0 SUMMARIZE : Note:COUNT ( PNO ) is not an invocation of the agg op called COUNT! which takes a rel as its argument... So what is it ??? Hmmm... note this tuple in particular!

215 Copyright C. J. Date 2008page 214 Heading of PER rel must = that of some projection of SUMMARIZE rel... If it actually is such a projection, can replace PER spec by BY spec as in SX2 here: SUMMARIZE SP BY { SNO } ADD ( COUNT ( PNO ) AS PCT ) SNO PCT S1 6 S2 2 S3 1 S4 3 SUMMARIZE (cont.) : Misses S5, with count of 0... because BY { SNO } is shorthand for PER ( SP { SNO } )

216 Copyright C. J. Date 2008page 215 SELECTSNO, COUNT ( ALL PNO ) AS PCT FROM SP GROUPBY SNO Summarizations typically formulated in SQL by means of SELECT exp with explicit GROUP BY /* but see later */ (Recall that "aggregations" typically have implicit GROUP BY) But what about Example SX1 ??? Straightforward GROUP BY doesnt do the job... Instead: EXAMPLE SX2 HAS A DIRECT SQL ANALOG :

217 Copyright C. J. Date 2008page 216 SELECTS.SNO, (SELECT COUNT ( ALL PNO ) /* AS PCT ??? */ FROM SP WHERE SP.SNO = S.SNO ) AS PCT FROM S/* double coercion */ Example SX2 could be done the same way: SELECTDISTINCTSPX.SNO, (SELECT COUNT ( ALL SPY.PNO ) FROM SP AS SPY WHERE SPY.SNO = SPX.SNO ) AS PCT FROMSP AS SPX GROUP BY is logically redundant! EXAMPLE SX1 IN SQL :

218 Copyright C. J. Date 2008page 217 /* SX3 : Slight variation on SX1 */ SUMMARIZE SP PER ( S { SNO } ) ADD ( SUM ( QTY ) AS TOTQ ) /* SQL analog... or is it? */ SELECTS.SNO, ( SELECTSUM ( ALL QTY ) FROMSP WHERESP.SNO = S.SNO ) AS TOTQ FROM S /* SX4 : Slight variation on SX3 */ ( SUMMARIZE SP PER ( S { SNO } ) ADD ( SUM ( QTY ) AS TOTQ ) ) WHERE TOTQ > 250

219 Copyright C. J. Date 2008page 218 SELECTSNO, SUM ( ALL QTY ) AS TOTQ FROM SP GROUPBY SNO HAVINGSUM ( ALL QTY ) > 250 /* not TOTQ > 250 !!! */ Or: SELECTDISTINCT SPX.SNO, (SELECTSUM ( ALL SPY.QTY ) FROM SP AS SPY WHERESPY.SNO = SPX.SNO ) AS TOTQ FROMSP AS SPX WHERE(SELECTSUM ( ALL SPY.QTY ) FROMSP AS SPY WHERE SPY.SNO = SPX.SNO ) > 250 HAVING is logically redundant! SQL ANALOG /* or is it? */ :

220 Copyright C. J. Date 2008page 219 GROUP BY / HAVING formulations often more succinct On the other hand, they sometimes give the "wrong" answer, or at least not the answer really wanted Recommendations: If you use GROUP BY or HAVING, make sure youre summarizing the right table (typically suppliers rather than shipments, in terms of our example) Watch out for empty sets... Use COALESCE wherever necessary

221 Copyright C. J. Date 2008page 220 BACK TO Tutorial D : Image rels can be very useful in connection with summarization... In fact, they make SUMMARIZE logically redundant! SUMMARIZE SP PER ( S { SNO } ) ADD ( COUNT ( PNO ) AS PCT ) Or: EXTEND S { SNO } ADD ( COUNT ( !!SP ) AS PCT ) For each supplier, get supplier details and total, maximum, and minimum shipment quantity: EXTEND S ADD ( SUM ( !!SP, QTY ) AS TOTQ, MAX ( !!SP, QTY ) AS MAXQ, MIN ( !!SP, QTY ) AS MINQ ) /* note use of "multiple EXTEND" */

222 Copyright C. J. Date 2008page 221 For each supplier, get supplier details, total shipment quantity, and grand total shipment quantity: EXTEND S ADD SNOTOTQGTOTQ (SUM( !!SP, QTY ) AS TOTQ, SUM( SP, QTY ) AS GTOTQ ) S S For each city c, get c and total and average shipment quantities for all shipments for which supplier and part city are both c WITH ( S JOIN SP JOIN P ) AS TEMP : EXTEND TEMP { CITY } ADD ( SUM ( !!TEMP, QTY ) AS TOTQ, AVG ( !!TEMP, QTY ) AS AVGQ )

223 Copyright C. J. Date 2008page 222 RECALL THESE RELATIONS : R1SNO PNO S2 P1 S2 P2 S3 P2 S4 P2 S4 P4 S4 P5 R4SNO PNO_REL S2 PNO P1 P2 S3 PNO P2 S4 PNO P2 P4 P5 Type of R4 = RELATION {SNO CHAR, PNO_REL RELATION { PNO CHAR } }

224 Copyright C. J. Date 2008page 223 R1 GROUP ( { PNO } AS PNO_REL ) : gives R4 R4 UNGROUP ( PNO_REL ) : gives R1 SQL has no direct counterparts Exercise: What does this do? EXTEND R1 { SNO } ADD ( !!R1 AS PNO_REL ) GROUP AND UNGROUP :

225 Copyright C. J. Date 2008page 224 What if parts in Paris were in Nice and their weight doubled? UPDATE PWITH T1 AS WHERE CITY = Paris :(SELECTP.* {CITY := Nice,FROMP WEIGHT := 2 * WEIGHT } WHERECITY = Paris ), T2 AS /* read-only op !!! */ (SELECT P.*, Nice AS NC, 2 * WEIGHT AS NW FROM T1 ) SELECT PNO, PNAME, COLOR, NW AS WEIGHT, NC AS CITY FROMT2 "WHAT IF" QUERIES :

226 Copyright C. J. Date 2008page 225 WITH(P WHERE CITY = Paris ) AS R1, (EXTEND R1 ADD ( Nice AS NC, 2 * WEIGHT AS NW ) ) AS R2, R2 { ALL BUT CITY, WEIGHT } AS R3 : R3 RENAME ( NC AS CITY, NW AS WEIGHT ) /* can now explain expansion of UPDATE statement: */ UPDATE P WHERE CITY = Paris : { CITY := Nice, WEIGHT := 2 * WEIGHT } ; Expansion: P :=(P WHERE CITY Paris ) UNION (UPDATE P WHERE CITY = Paris : { CITY := Nice, WEIGHT := 2 * WEIGHT } ) ; Tutorial D EXPRESSION IS SHORTHAND FOR :

227 Copyright C. J. Date 2008page 226 WHAT ABOUT "ORDER BY" ??? Not a relational op (because result is not a relation)... So not legal in relational exps, and hence not in view definitions etc. Produces ordered list or sequence of tuples Also, not a function Result indeterminate (in general) … /* like many SQL expressions, in fact */ Also, produces a sequence of tuples, yet " " aren't defined for tuples!

228 Copyright C. J. Date 2008page Setting the scene 8.SQL and constraints 2. Types and domains 9.SQL and views 3.Tuples and relations, 10.SQL and logic I: rows and tablesRelational calculus 4.No duplicates, no nulls11.SQL and logic II: Using logic to write SQL 5.Base relvars, base tables 12.Further SQL topics 6.SQL and algebra I: The original operators13.Appendix: The relational model 7.SQL and algebra II: Additional operators14.Appendix: DB design STRUCTURE OF PRESENTATION :

229 Copyright C. J. Date 2008page 228 INTEGRITY CONSTRAINTS : An integrity constraint is, loosely, a boolean expression that must evaluate to TRUE Two basic kinds: Type constraints / database constraints Constraints = really what DB management is all about! Talking of poor quality of education... Constraints are vital, and proper DBMS support for them is vital as well I dont care how fast your system runs if I cant trust the answers its giving me!

230 Copyright C. J. Date 2008page 229 TYPE CONSTRAINTS : Define values that make up a given type... For system defined types, not much to say... So suppose for sake of example that quantities are of a user defined type, say QTY: TYPE QTY /* quantities */ POSSREP QPR { Q INTEGER CONSTRAINT Q 0 AND Q 5000 } ; TYPE POINT /* geometric points in 2D space */ POSSREP CARTESIAN { X FIXED, Y FIXED CONSTRAINT SQRT ( X ** 2 + Y ** 2 ) } ; Checked "immediately" (actually during selector operator invocations... see next page)

231 Copyright C. J. Date 2008page 230 SELECTORS AND THE_ OPERATORS : One selector per possrep One THE_ op per possrep component Examples: QPR ( 250 )/* selector invocation*/ /*... actually a literal*/ Simplify QTY type def: TYPE QTY POSSREP { Q INTEGER CONSTRAINT Q > 0 AND Q < 5000 } ; Selector invocation becomes: QTY ( 250 )

232 Copyright C. J. Date 2008page 231 SELECTORS AND THE_ OPERATORS (cont.) : Examples (cont.): THE_Q ( QZ )/* THE_ op invocation*/ /* (QZ is of type QTY)*/ Simplify POINT type def: TYPE POINT POSSREP { X FIXED, Y FIXED CONSTRAINT... } ; POINT ( PX, PY )/* POINT selector invocation*/ POINT ( 5.7, -3.9 )/* POINT literal*/ THE_X ( P )/* THE_ op invocation*/

233 Copyright C. J. Date 2008page 232 WHAT ABOUT SQL ??? SQL doesnt support type constraints at all! E.g.: CREATE TYPE QTY AS INTEGER FINAL ; /* all available integers denote valid quantities ?!? */ So to constrain quantities further, must specify approp database constraint on every use of the type... E.g.: CREATE TABLE SP (SNOVARCHAR(5)NOT NULL, PNOVARCHAR(6)NOT NULL, QTY QTYNOT NULL, …, CONSTRAINT SPQC CHECK ( QTY >= QTY(0) AND QTY <= QTY(5000) ) ) ;

234 Copyright C. J. Date 2008page 233 SQL does support selectors and THE_ ops (in effect), but doesnt use these terms and support not entirely straightforward... Further details beyond scope of this seminar POINT example in SQL: CREATE TYPE POINT AS ( X NUMERIC(5,1), Y NUMERIC(5,1) ) NOT FINAL ; Recommendation:Use database constraints to make up for SQLs lack of type constraints Duplication of effort much better than having bad data in the database!

235 Copyright C. J. Date 2008page 234 CONSTRAINT CX1 IS_EMPTYCREATE ASSERTION CX1 CHECK ( S WHERE STATUS < 1(NOT EXISTS OR STATUS > 100 ) ;(SELECT* FROM S WHERESTATUS < 1 ORSTATUS > 100 ) ) ; CONSTRAINT CX2 IS_EMPTYCREATE ASSERTION CX2 CHECK ( S WHERECITY = London(NOT EXISTS ANDSTATUS 20 ) ;(SELECT* FROM S WHERECITY = London ANDSTATUS <> 20 ) ) ; CX1 and CX2 are "tuple" (or "row") constraints: Deprecated terms DATABASE CONSTRAINTS :

236 Copyright C. J. Date 2008page 235 CONSTRAINT CX3 CREATE ASSERTION CX3 CHECK COUNT ( S ) = (UNIQUE ( SELECTSNO COUNT ( S { SNO } ) ;FROM S ) ) ; {SNO} is a superkey for S In practice would use KEY or UNIQUE shorthand Note: UNIQUE in SQL returns TRUE iff every row in its argument table is distinct /* more later */ Alternative SQL formulation: CREATEASSERTION CX3 CHECK ( ( SELECT COUNT ( SNO ) FROM S ) = (SELECT COUNT ( DISTINCT SNO ) FROM S ) ) ;

237 Copyright C. J. Date 2008page 236 CONSTRAINT CX4 CREATE ASSERTION CX4 CHECK COUNT ( S { SNO } ) = (NOT EXISTS (SELECT * COUNT ( S { SNO, CITY } ) ;FROM S AS SX WHERE EXISTS ( SELECT * FROMS AS SY WHERESX.SNO = SY.SNO ANDSX.CITY <> SY.CITY ) ) ) ; Functional dependence {SNO} {CITY} In practice this FD implied by fact that {SNO} is a superkey, so no need to state CX4 explicitly... but not all FDs are consequences of keys But most will be, if DB well designed!

238 Copyright C. J. Date 2008page 237 CONSTRAINT CX5 IS_EMPTY CREATE ASSERTION CX5 CHECK ( ( S JOIN SP ) (NOT EXISTS WHERE STATUS < 20(SELECT* ANDPNO = P6 ) ;FROMS NATURAL JOIN SP WHERESTATUS < 20 AND PNO = P6 ) ) ; "Multi-relvar" constraint: Slightly deprecated term CX1-CX4 were single-relvar constraints, or just relvar constraints for short: Slightly deprecated terms

239 Copyright C. J. Date 2008page 238 CONSTRAINT CX6 CREATE ASSERTION CX6 CHECK SP { SNO } S { SNO } ; (NOT EXISTS ( SELECTSNO FROMSP EXCEPT SELECTSNO FROMS ) ) ; Foreign key constraint from SP to S In practice would use FOREIGN KEY shorthand (at least in SQL)

240 Copyright C. J. Date 2008page 239 DATABASE CONSTRAINTS IN SQL : Any DB constraint expressible in Tutorial D can be expressed in SQL via CREATE ASSERTION (unless "possibly nondeterministic" ???) But SQL also supports base table constraints... e.g.: CREATE TABLE SP (..., CONSTRAINT CX5 CHECK ( PNO <> P6 OR ( SELECT STATUS FROM S WHERE SNO = SP. SNO ) > 20 ) ) ; Equivalent formulation could be specified on base table S insteador any base table in the database! Useful for "row constraints" but not for other kinds

241 Copyright C. J. Date 2008page 240 CREATE TABLE S (..., CONSTRAINT CX1 CHECK ( STATUS >= 1 AND STATUS <= 100 ) ) ; CREATE TABLE S (..., CONSTRAINT CX2 CHECK ( STATUS = 20 OR CITY <> London ) ) ; SQL also supports column constraints... e.g., NOT NULL, and key constraints for keys of degree one Note: Base table constraint for T automatically satisfied if T is empty (!) (Important)Most current products support simple row constraints (plus key and FK constraints) only !!!

242 Copyright C. J. Date 2008page 241 OK, so I saved the bad news till last... Recommendations: State constraints declaratively wherever possible Use triggered procedures to enforce constraints that cant be stated declaratively See Applied Mathematics for Database Professionals, by Lex de Haan and Toon Koppelaars (Apress, 2007) Lobby the vendors!

243 Copyright C. J. Date 2008page 242 Distinction single- vs. multi-relvar constraints is more pragmatic than logical... because: Like single-relvar constraints, multi-relvar constraints must be checked "immediately" !!! All constraints must be satisfied at statement boundaries no "deferred" or COMMIT-time checking at all! (contrary to SQL standard and some commercial products) In order to explain this unorthodox view, I need to digress for a moment and talk about transactions...

244 Copyright C. J. Date 2008page 243 THE "ACID" PROPERTIES : Atomicity: Transactions are "all or nothing" Consistency: Transactions transform a consistent state of the DB into another consistent state, without necessarily preserving consistency at all intermediate points Isolation: Any given transaction's updates are concealed from all other transactions until the given transaction commits Durability: Once a transaction commits, its updates survive in the DB, even if there's a subsequent system crash

245 Copyright C. J. Date 2008page 244 One argument in favor of transactions has always been that transactions are supposed to be a unit of integrity (see "Consistency" on previous page) But I no longer believe this argument!I now think statements have to be that "unit of integrityi.e., to repeat, constraints must be satisfied at statement boundaries Why have I changed my mind? For at least five reasons:

246 Copyright C. J. Date 2008page 245 FIRST AND MOST IMPORTANT : As we have seen, a DB can be regarded as a collection of propositions, assumed by convention to be ones that evaluate to TRUE And if that collection is ever allowed to include any inconsistencies, then all bets are off! I'll come back to this point later... The "I" property might mean that only one transaction ever sees any particular inconsistency, but that particular transaction does see the inconsistency and can thus produce wrong answers

247 Copyright C. J. Date 2008page 246 SECOND : I don't agree that any given inconsistency can be seen by only one transaction, anyway... E.g.: Suppose transaction TX1 obtains some incorrect information from the DB and writes it to file F Suppose transaction TX2 now reads that same information from file F TX1 has "infected" TX2... TX1 and TX2 aren't really isolated from each other... Even if they run at totally different times! I don't believe in the "I" property of transactions

248 Copyright C. J. Date 2008page 247 THIRD : Don't want every program or other code unit to have to cater for the possibility that the DB might be inconsistent when it runs! Severe loss of orthogonality if a procedure that assumes consistency becomes unsafe to use when checking is deferred Desirable to be able to specify a code unit independently of whether that unit is to run as a transaction per se or as part of a transaction In fact, Id like nested transactions... but that's a topic for another day

249 Copyright C. J. Date 2008page 248 FOURTH : The Principle of Interchangeability (of base relvars and viewssee later) implies that the very same constraint might be a single-relvar constraint with one design for the DB and a multi-relvar constraint with another E.g., VARLS VIRTUAL ( S WHERE CITY = London ) ; VARNLS VIRTUAL ( S WHERE CITY London ) ; Instead of S being real and LS and NLS virtual, we could make LS and NLS real and S virtual!S is the union of restrictions LS and NLS, and mapping works both ways /* more on interchangeability later */

250 Copyright C. J. Date 2008page 249 SNO unique in S single-relvar constraint SNO unique across LS and NLS multi-relvar constraint CONSTRAINT CX7 IS_EMPTYCREATE ASSERTION CX7 CHECK ( LS { SNO } JOIN (NOT EXISTS NLS { SNO } ) ;(SELECT * FROM LS, NLS WHERE LS.SNO = NLS.SNO ) ) ;

251 Copyright C. J. Date 2008page 250 FIFTH : Semantic optimization uses constraints to simplify queries (for performance reasons)... E.g.: Constraint: All red parts must be stored in London Query:Find suppliers who supply only red parts and are located in the same city as at least one of the parts they supply Find London suppliers who supply only red parts Payoff could be orders of magnitude greater than that from conventional optimization... but it requires DB to be consistent at all times, not just transaction boundaries (if constraints arent satisfied, simplifications will be invalid, and answers will be wrong)

252 Copyright C. J. Date 2008page 251 BUT DOESN'T SOME CHECKING HAVE TO BE DEFERRED ??? E.g., "Supplier S1 and part P1 are in the same city": If supplier S1 moves from London to Paris, then part P1 must move from London to Paris as well Conventional solution /* SQL */ : START TRANSACTION ; UPDATE S SET CITY = Paris WHERE SNO = S1 ; UPDATE P SET CITY = Paris WHERE PNO = P1 ; COMMIT ; /* integrity check done here */ If this transaction asks "Are supplier S1 and part P1 in the same city?" between the two UPDATEs, it will get the answer no

253 Copyright C. J. Date 2008page 252 Tutorial D SOLUTION : The multiple assignment operator lets us carry out several assignments as a single operation, without any integrity checking being done until all assignments have been executed: UPDATE S WHERE SNO = S1 : { CITY := Paris }, UPDATE P WHERE PNO = P1 : { CITY := Paris } ; Note comma separator … One statement, not two! Shorthand for: S := …, P := … ;

254 Copyright C. J. Date 2008page 253 SEMANTICS /* slightly simplified */ : 1.Evaluate source expressions 2. Execute individual assignments in sequence 3. Do integrity checking No individual assignment depends on any other... No way for the transaction to see an inconsistent state of the DB between the two UPDATEs, because notion of "between the two UPDATEs" has no meaning... Now no need for deferred checking at all! Note: Im not saying we dont need transactions !!! By the way: SQL already has some multiple assignment!

255 Copyright C. J. Date 2008page 254 Recommendation: Given the state of todays SQL products, some constraint checking will probably have to be deferred... In which case, you should do whatever it takesprobably terminate the transactionto force the check to be done before performing any operation that might rely on the constraint being satisfied

256 Copyright C. J. Date 2008page 255 CONSTRAINTS AND PREDICATES : Relvar predicate for R is "intended interpretation" for R … but it (and corresp propositions) arent and cant be understood by the system System can't know what it means for a "supplier" to "be located" somewhere, etc.that's interpretation System can't know a priori whether what the user tells it is true!can only check the integrity constraints... If OK, system accepts user assertion as true from this point forward System can't enforce truth, only consistency !!!

257 Copyright C. J. Date 2008page 256 Correct implies consistent Converse not true Inconsistent implies incorrect Converse not true DB is correct iff it fully reflects the true state of affairs in the real world... but the best the system can do is ensure the DB is consistent (= satisfies all known integrity constraints)

258 Copyright C. J. Date 2008page 257 Let C1, C2,..., Cn be all of the DB constraints that mention base relvar R. Then: ( C1 ) AND ( C2 ) AND... AND ( Cn ) AND TRUE is THE (total) relvar constraint for R Let R1, R2,..., Rm be all of the base relvars in DB, and let corresp (total) relvar constraints be RC1, RC2,..., RCm, respectively. Then: ( RC1 ) AND ( RC2 ) AND... AND ( RCm ) AND TRUE is THE (total) database constraint for DB

259 Copyright C. J. Date 2008page 258 The Golden Rule: No database is ever allowed to violate its total DB constraint /* and therefore: */ No relvar is ever allowed to violate its total relvar constraint Criterion for acceptability of updates... Total relvar constraint for R is systems best approximation to relvar predicate for R

260 Copyright C. J. Date 2008page 259 CONSTRAINTS ARE VITAL !!! Recall that a DB can be regarded as a collection of propositions... and if that collection is ever allowed to include any inconsistencies, all bets are off! Proof: Suppose DB implies both p and NOT p are TRUE (there's the inconsistency) Let q be any arbitrary proposition From truth of p, infer truth of p OR q From truth of p OR q and truth of NOT p, infer truth of q... but q was arbitrary !!!

261 Copyright C. J. Date 2008page Setting the scene 8.SQL and constraints 2. Types and domains 9.SQL and views 3.Tuples and relations, 10.SQL and logic I: rows and tablesRelational calculus 4.No duplicates, no nulls11.SQL and logic II: Using logic to write SQL 5.Base relvars, base tables 12.Further SQL topics 6.SQL and algebra I: The original operators13.Appendix: The relational model 7.SQL and algebra II: Additional operators14.Appendix: DB design STRUCTURE OF PRESENTATION :

262 Copyright C. J. Date 2008page 261 VIRTUAL RELVARS ("VIEWS") : A view is a relvar that "looks and feels" just like a base relvar but doesnt exist independently of other relvars (its defined in terms of them) Repeat: A view is a relvar! ("CREATE TABLE" vs. "CREATE VIEW" was at least a psychological mistake) A view is a derived relvar All virtual relvars are derived but some derived ones arent virtual /* see snapshots, later */ A view is a window into underlying relvars... Ops on view are "really" ops on those underlying relvars A view is a "canned query" (i.e., named rel exp)

263 Copyright C. J. Date 2008page 262 A view V is a relvar whose value at time t = result of evaluating certain rel exp at time t... View defining expression specified when V is defined and must mention at least one relvar VAR LS VIRTUALCREATE VIEW LS AS ( S WHERE (SELECT* CITY = London ) ;FROMS WHERE CITY = London ) WITH CHECK OPTION ; VAR NLS VIRTUALCREATE VIEW NLS AS ( S WHERE (SELECT* CITY London ) ;FROMS WHERE CITY <> London ) WITH CHECK OPTION ; VIEWS ARE RELVARS :

264 Copyright C. J. Date 2008page 263 CREATE VIEW allows parenthesized column name commalist after view name... E.g. CREATEVIEW SDS ( SNAME, DOUBLE_STATUS ) AS (SELECT DISTINCT SNAME, 2 * STATUS FROM S ) ; Recommendation: Dont do this. Instead: CREATEVIEW SDS AS (SELECT DISTINCT SNAME, 2 * STATUS AS DOUBLE_STATUS FROM S ) ; Tell DBMS once not twice that SNAME column is called SNAME!

265 Copyright C. J. Date 2008page 264 THE PRINCIPLE OF INTERCHANGEABILITY : Instead of S being real and LS and NLS virtual, we could make LS and NLS real and S virtualS is the union of restrictions LS and NLS, and mapping works both ways: VAR LS BASE RELATION { SNO CHAR, SNAME CHAR, STATUS INTEGER, CITY CHAR } KEY { SNO } ; VAR NLS BASE RELATION { SNO CHAR, SNAME CHAR, STATUS INTEGER, CITY CHAR } KEY { SNO } ; VAR S VIRTUAL ( LS D_UNION NLS ) ; /* disjoint union */ /* plus certain constraints on, e.g., CITY */

266 Copyright C. J. Date 2008page 265 Designs are information equivalent... So: Which relvars are base ones and which virtual is arbitrary (formally speaking, at least)... Hence: The Principle of Interchangeability: There must be no arbitrary and unnecessary distinctions between base and virtual relvars... Virtual relvars should "look and feel" just like base ones to the user Having keys or not Integrity in general "Entity integrity" Tuple IDs... and we MUST be able to "update views" !!!

267 Copyright C. J. Date 2008page 266 RELATION CONSTANTS /* digression */ : View defining exp must mention at least one relvar... Otherwise the "variable" isnt a variable! Consider, e.g., following SQL view defn: CREATE VIEW S_CONST(SNO,SNAME,STATUS,CITY ) AS VALUES(S1,Smith, 20, London ), (S2, Jones, 10,Paris ), (S3, Blake, 30,Paris ), (S4, Clark, 20,London ), (S5, Adams, 30,Athens ) ; Not updatable! Really a named relation constant

268 Copyright C. J. Date 2008page 267 NAMED CONSTANTS ARE USEFUL : CONST PERIODIC_TABLE INIT ( RELATION { TUPLE { ELEMENT Hydrogen, SYMBOL H, ATOMICNO 1 }, { TUPLE { ELEMENT Helium, SYMBOL He, ATOMICNO 2 }, { TUPLE { ELEMENT Uranium, SYMBOL U, ATOMICNO 92 } } ) ; Note: TABLE_DUM and TABLE_DEE are system defined "relcons" Can simulate relcons via view mechanism, but theres a logical difference between variables and constants also between constants and literals

269 Copyright C. J. Date 2008page 268 VIEWS AND PREDICATES : A view is a relvar and has a relvar predicate, derived from preds for underlying relvars... E.g., view LS: Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in city CITY AND city CITY is London More colloquially: Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in London But latter obscures fact that CITY is a parameter... It is a parameter, but corresp argument is constant (in practice, would probably project away CITY attribute)

270 Copyright C. J. Date 2008page 269 User operates on views as if they were real...DBMS maps operations into corresponding operations on base relvars in terms of which views are (ultimately) defined Read-only operations are straightforward: e.g., SELECT SNOmaps toSELECTLS.SNO FROM LS FROM (SELECTS.* WHERE STATUS > 10FROMS WHERE S.CITY = London ) AS LS WHERELS.STATUS > 10 and then (?) to SELECTS.SNO FROMS WHERE S.CITY = London ANDS.STATUS > 10 RETRIEVAL OPERATIONS :

271 Copyright C. J. Date 2008page 270 Foregoing substitution procedure works because of closure! Didnt always work in early versions of SQL... E.g.: CREATE VIEW V AS (SELECT CITY, SUM ( STATUS ) AS ST FROM S GROUP BY CITY ) ; SELECT CITYmaps to (???)SELECTS.CITY FROMVFROMS WHEREST > 25WHERESUM ( S.STATUS ) > 25 GROUPBY S.CITY So some products implement some view retrievals by materialization instead of substitution (!) RETRIEVAL OPERATIONS (cont.) :

272 Copyright C. J. Date 2008page 271 VIEWS AND CONSTRAINTS : A view is a relvar and has a (total) relvar constraint, derived from constraints for underlying relvars E.g., view LS: {SNO} is a key... AND CITY = London Even though derived, nice to be able to declare such view constraints explicitly... (a) DBMS might not be able to do the derivation; (b) documentation (explain semantics); (c) another reason to come! E.g.: VARLS VIRTUAL ( S WHERE CITY = London ) KEY { SNO };

273 Copyright C. J. Date 2008page 272 Recommendation:In SQL, include such specifications as comments. E.g.: CREATE VIEW LS AS (SELECT * FROMS WHERE CITY = London ) /* UNIQUE ( SNO ) */ WITH CHECK OPTION ; Note:"View constraints" can always be formulated via CREATE ASSERTION (if supported!) Of course, we dont want "the same" constraint to be checked twice...

274 Copyright C. J. Date 2008page 273 CREATE TABLE FDH (FLIGHT..., DESTINATION..., HOUR..., UNIQUE ( FLIGHT ) ) ; CREATE TABLE DFGP (DAY..., FLIGHT..., GATE..., PILOT..., UNIQUE ( DAY, FLIGHT ) ) ; Constraints: BTCX1:IF ( f1,n1,h ), ( f2,n2,h ) IN FDH AND ( d,f1,g,p1 ), ( d,f2,g,p2 ) IN DFGP THEN f1 = f2 AND p1 = p2 BTCX1:IF ( f1,n1,h ), ( f2,n2,h ) IN FDH AND ( d,f1,g1,p ), ( d,f2,g2,p ) IN DFGP THEN f1 = f2 AND g1 = g2 A MORE COMPLEX EXAMPLE :

275 Copyright C. J. Date 2008page 274 CREATE ASSERTION BTCX1 CHECK ( NOT (EXISTS (SELECT * FROM FDH AS FX WHERE EXISTS (SELECT * FROM FDH AS FY WHERE EXISTS (SELECT * FROM DFGP AS DX WHERE EXISTS (SELECT * FROM DFGP AS DY WHERE FY.HOUR = FX.HOUR AND DX.FLIGHT = FX.FLIGHT AND DY.FLIGHT = FY.FLIGHT AND DY.DAY = DX.DAY AND DY.GATE = DX.GATE AND (FX.FLIGHT <> FY.FLIGHT OR DX.PILOT <> DY.PILOT ) ) ) ) ) ) ) ; BTCX2 is analogous

276 Copyright C. J. Date 2008page 275 CREATE VIEW V AS (FDH NATURAL JOIN DFGP, UNIQUE ( DAY, HOUR, GATE ),/* hypothetical */ UNIQUE ( DAY, HOUR, PILOT ) ) ;/* syntax !!! */ Or /* valid syntax */ : CREATE VIEW V AS FDH NATURAL JOIN DFGP ; CREATE ASSERTION VCX1 CHECK (UNIQUE ( SELECT DAY, HOUR, GATE FROM V ) ) ; CREATE ASSERTION VCX2 CHECK (UNIQUE ( SELECT DAY, HOUR, PILOT FROM V ) ) ; /* Could replace "V" by defn */ BUT :

277 Copyright C. J. Date 2008page 276 UPDATE OPERATIONS : The Principle of Interchangeability implies that views must be updatable! (What? Really? Even views like S JOIN P?) Well, certain updates on certain base relvars cant be done, either!... Fail on violations of either The Golden Rule or The Assignment Principle (ignore latter possibility for simplicity) So to support updates on view V, DBMS needs to know total relvar constraint VC for V... i.e., needs to do constraint inference Todays products dont and are therefore very weak on view updating

278 Copyright C. J. Date 2008page 277 UPDATE OPERATIONS (cont.) : Todays products typically dont allow updating views any more complex than simple restrictions and/or projections of single underlying base table (and even here there are problems)... e.g., DELETE on view LS probably OK... but what about INSERT ??? Recommendation:Specify WITH CASCADED CHECK OPTION on view definitions whenever possible Note: SQLs support for view updating is not only limited and ad hocits also extremely hard to understand From the SQL standard:

279 Copyright C. J. Date 2008page 278 [The] QE1 is updatable if and only if for every or QE2 that is simply contained in QE1: a)QE1 contains QE2 without an intervening that specifies UNION DISTINCT, EXCEPT ALL, or EXCEPT DISTINCT. b) If QE1 simply contains a NJQE that specifies UNION ALL, then: i)NJQE immediately contains LO and a RO such that no leaf generally underlying table of LO is also a leaf generally underlying table of RO. (cont.)

280 Copyright C. J. Date 2008page 279 ii)For every column of NJQE, the underlying columns in the tables identified by LO and RO, respectively, are either both updatable or not updatable. c)QE1 contains QE2 without an intervening that specifies INTERSECT. d)QE2 is updatable.

281 Copyright C. J. Date 2008page 280 Foregoing is just one of many rules that have to be taken in combination in order to determine whether a given SQL view is updatable Rules scattered over many different parts of the document Rules rely on many additional concepts and constructs e.g., updatable columns, leaf generally underlying tables, sdefined in still further parts of the document OBSERVE THAT :

282 Copyright C. J. Date 2008page Restriction and/or projection of single base table 2.One to one or one to many join of two base tables (many side only, in latter case) 3.UNION ALL or INTERSECT of two distinct base tables 4.Certain combinations of Cases 1-3 above Even these cases are treated incorrectly, because of (a) lack of constraint inference; (b) duplicates; (c) nulls LOOSELY, FOLLOWING SQL VIEWS ARE UPDATABLE :

283 Copyright C. J. Date 2008page 282 Picture complicated still further... A view can be: Updatable Potentially updatable Simply updatable Insertable into Note implication that some views might permit some updates but not others... and further implication that DELETE and INSERT might not be inverses Recommendation: Lobby the vendors!

284 Copyright C. J. Date 2008page User U1 who defines view V is aware of exp X that defines V... U1 can use name V wherever exp X is intended, but such uses are really just shorthand E.g., U1 might have perception S and SP(for updates) plus V S JOIN SP(for retrievals) but U1 knows these relvars arent all independent 2.User U2 who is merely informed that V is available for use should typically not be aware of exp X... To U2, V should look just like a base relvar (logical data independence) /* have been assuming this case */ WHAT ARE VIEWS FOR ???

285 Copyright C. J. Date 2008page 284 Contrast views and snapshotsalso derived, but real not virtual... e.g.: VAR LSS SNAPSHOT ( S WHERE CITY = London ) KEY { SNO } REFRESH EVERY DAY ; SQL has CREATE TABLE AS... but no REFRESH Many applications can toleratemight even requiredata "as of" some point in time (e.g., end of an accounting period) VIEWS AND SNAPSHOTS :

286 Copyright C. J. Date 2008page 285 Much current DB literature refers to snapshots as "materialized views"... which is a contradiction in terms, pretty much (whole point about views as far as RM is concerned is that theyre virtual) And then typically goes on to abbreviate "materialized view" to just view (!)... So ubiquitously, in fact, that the unqualified term view has come to mean, almost always, a snapshot instead (at least in the academic world), and we no longer have a good term for view in its original sense Recommendations: Never use the term view, unqualified, to mean a snapshot; never use the term materialized view; and watch out for violations of these recommendations! WATCH OUT FOR TERMINOLOGY !

287 Copyright C. J. Date 2008page Setting the scene 8.SQL and constraints 2. Types and domains 9.SQL and views 3.Tuples and relations, 10.SQL and logic I: rows and tablesRelational calculus 4.No duplicates, no nulls11.SQL and logic II: Using logic to write SQL 5.Base relvars, base tables 12.Further SQL topics 6.SQL and algebra I: The original operators13.Appendix: The relational model 7.SQL and algebra II: Additional operators14.Appendix: DB design STRUCTURE OF PRESENTATION :

288 Copyright C. J. Date 2008page 287 Relational calculus: Alternative to relational algebra Queries, constraints, view definitions, etc. can be stated in calculus terms as well as algebraic ones /* sometimes one is easier, sometimes the other */ Applied form of predicate calculus (aka predicate logic) RDB language can be based on either algebra or calculus... Tutorial D? SQL? SQL AND LOGIC :

289 Copyright C. J. Date 2008page 288 LOGIC : PROPOSITIONS A proposition is a declarative sentence, or statement, thats categorically either true or false. Examples: = > 7 3.Jupiter is a star 4.Mars has two moons 5.Venus is between Earth and Mercury

290 Copyright C. J. Date 2008page 289 POINTS ARISING : l Dont fall into the common trap of thinking propositions are always true... A false proposition is still a valid proposition l Informally, P is a valid proposition if and only if the following is a valid question: "Is it true that P?" l Very fine point (which Im mostly going to ignore): The proposition isnt really the declarative sentence as suchrather, its the assertion made by that sentence... E.g., "Its hot" and "Il fait chaud" denote the same proposition

291 Copyright C. J. Date 2008page 290 SO HOW MANY OF THE FOLLOWING ARE PROPOSITIONS ??? 1.Bach is the greatest musician who ever lived. 2.Whats the time? 3.Supplier S2 is located in some city x. 4.Some countries have a female president. 5.All politicians are corrupt. 6.Supplier S1 is located in London. 7.We both have the same favorite author x. 8.Nothing is heavier than lead. 9.It will rain tomorrow. 10.Supplier S6s city is unknown.

292 Copyright C. J. Date 2008page 291 LOGIC : CONNECTIVES Operators for combining propositions to make further (compound) propositions... Simple proposition = one with no connectives... Truth tables: Negation:E.g., NOT (Jupiter is a star): TRUE Disjunction:E.g., (Mars has two moons) OR (2 + 3 > 7): TRUE Conjunction:E.g.,(Mars has two moons) AND (2 + 3 > 7): FALSE

293 Copyright C. J. Date 2008page 292 Implication (IMPLIES, also written IF... THEN...): E.g., IF (Mars has two moons) THEN (Venus is between Earth and Mercury) : TRUE /* see later */ Bi-implication (BI-IMPLIES, also written IF AND ONLY IF or IFF or " ") : E.g., (2 + 3 = 5) IFF (Jupiter is a star): FALSE In practice we use symbols for the connectives (usually) and adopt precedence rules that allow us to drop parens

294 Copyright C. J. Date 2008page 293 CAVEAT : Connectives are close but not identical to their natural language counterparts... because theyre meant to be context independent E.g., p AND q q AND p But "and" is not necessarily commutative in natural language... Contrast: I voted for a change in leadership and I was seriously disappointed I was seriously disappointed and I voted for a change in leadership

295 Copyright C. J. Date 2008page 294 A NOTE ON IMPLICATION : Truth table not symmetric (i.e., op not commutative): TRUE if p is FALSE and q is TRUE l IF p THEN q is FALSE if p is TRUE and q is FALSE FALSE implies anything! l IF pTHEN q ( NOT p ) OR q Aside:This latter is a tautology... Evaluates to TRUE no matter what p and q stand for* And heres a contradiction: p AND NOT p Tautologies of form a b are particularly important *

296 Copyright C. J. Date 2008page 295 RE "FALSE IMPLIES ANYTHING" : Consider integrity constraint on suppliers: If supplier s is located in London, then supplier s must have status 20 Formally, this is an implication:* IF s.CITY = London THEN s.STATUS = 20 Dont want the check to fail if the city isnt London! * Slightly simplified for sake of the example

297 Copyright C. J. Date 2008page 296 Again consider following constraint: IF s.CITY = London THEN s.STATUS = 20 Following is logically equivalent: IF NOT ( s.STATUS = 20 ) THEN NOT ( s.CITY = London ) i.e., IF s.STATUS 20 THEN s.CITY London Contrapositive of original... More generally: IF p THEN q IF NOT q THEN NOT p

298 Copyright C. J. Date 2008page 297 HOW MANY OF THE FOLLOWING PROPOSITIONS ARE LOGICALLY DISTINCT ??? 1.( P.WEIGHT > 17.0 ) IMPLIES ( P.CITY Paris ) 2.( P.CITY = Paris ) IMPLIES ( P.WEIGHT < 17.0 ) 3.( P.WEIGHT < 17.0 ) OR ( P.CITY Paris ) 4.NOT ( ( P.CITY = Paris ) AND ( P.WEIGHT > 17.0 ) )

299 Copyright C. J. Date 2008page 298 Let x = (P.WEIGHT > 17.0), y = (P.CITY Paris ) IF x THEN y IF NOT y THEN NOT x ( NOT x ) OR y NOT ( ( NOT y ) AND x ) Lessons learned: Manipulations can be done purely formally! Equivalences not always immediately obvious! HOW MANY OF THE FOLLOWING PROPOSITIONS ARE LOGICALLY DISTINCT ???

300 Copyright C. J. Date 2008page 299 MORE CONNECTIVES : p or q butNOT (p OR q)NOT (p AND q) not both= neither p= not both p nor qand q Peirce arrowSheffer stroke* p q p q Exactly 4 monadic / 16 dyadic connectives in total (not all named): Slightly unfortunate because " " is also used for OR *

301 Copyright C. J. Date 2008page 300 THE 4 MONADICS :

302 Copyright C. J. Date 2008page 301 THE 16 DYADICS :

303 Copyright C. J. Date 2008page 302 COMPLETENESS : A logical system is truth functionally complete if and only if all possible connectives can be expressed in terms of the given ones The 20 possible connectives are not all primitive Primitive sets:{ NOT, OR } { NOT, AND } { NOR } { NAND }

304 Copyright C. J. Date 2008page 303 TRUTH TABLES REVISITED : Alternative style (example): This style can be used to show truth value of arb log exp in terms of truth values of components: e.g., (NOT q) IMPLIES (NOT p)

305 Copyright C. J. Date 2008page 304 EXAMPLES : l Prove (NOT p) OR q p IMPLIES q l Prove (NOT p) AND ( p OR q) IMPLIES q is a tautology

306 Copyright C. J. Date 2008page 305 CONNECTIVES REVISITED : OR and AND are fundamentally dyadic... but n-adic versions can be defined (why, exactly?). Let p1, p2..., pn (n > 0) be propositions. Then: l OR {p1,p2,...,pn} is equivalent to: FALSE OR (p1) OR (p2) OR... OR (pn) Note:If none of the ps involves any ORs, this prop is in disjunctive normal form (DNF) l AND {p1,p2,...,pn} is equivalent to: TRUE AND (p1) AND (p2) AND... AND (pn) Note:If none of the ps involves any ANDs, this prop is in conjunctive normal form (CNF)

307 Copyright C. J. Date 2008page 306 LOGIC : PREDICATES A predicate is a truth valued function. Examples: 1.x is a star 2.x has two moons 3.x has m moons 4.x is between Earth and y 5. x is between y and z Note parameters (or placeholders or free variables)... Invoking ("instantiating") predicate involves replacing parameters by arguments and yields a proposition (which evaluates to TRUE or FALSE, by definition)

308 Copyright C. J. Date 2008page 307 Arguments satisfy predicate iff resulting proposition evaluates to TRUE... E.g., the sun satisfies "x is a star," the moon doesnt Predicate with n parameters is n-place or n-adic (and if n = 0 the predicate is a proposition) Connectives apply to predicates as well as propositions... Simple/compound terminology applies too Terminology: Predicate logic (aka predicate calculus) = study of predicates, connectives, and logical inferences that can be made using such predicates and connectives

309 Copyright C. J. Date 2008page 308 LOGIC : INFERENCE Logic includes rules of inference by which new truths (theorems) can be inferred from given truths (axioms and/or previously proved theorems) Modus Ponens: If p IMPLIES q is true and p is true, we can infer that q is true ("direct reasoning") E.g., given the truth of both "If I have no money then I will have to wash dishes" and "I have no money," we can infer truth of "I will have to wash dishes" Modus Tollens: If p IMPLIES q is true and q is false, we can infer that p is false ("indirect reasoning")

310 Copyright C. J. Date 2008page 309 LOGIC : QUANTIFICATION Another way to get a proposition from a predicate... Consider monadic predicate p(x) (parameter shown for clarity). Then these are propositions: EXISTS x ( p ( x ) )/* existential quantifier*/ /* "backward E"*/ Meaning:At least one value a exists such that p(a) evaluates to TRUE FORALL x ( p ( x ) )/* universal quantifier */ /* "upside down A" */ Meaning:All possible values a are such that p(a) evaluates to TRUE

311 Copyright C. J. Date 2008page 310 EXAMPLES : EXISTS x ( x is a logician ) TRUE (e.g., take x to be Bertrand Russell) Single example suffices to show truth FORALL x ( x is a logician ) FALSE (e.g., take x to be George W. Bush) Single counterexample suffices to show falsity Note:Parameter x must "range over" some set of permissible valuessee later

312 Copyright C. J. Date 2008page 311 LET x AND y RANGE OVER PERSONS : Consider dyadic predicate "x is taller than y" Quantify over x (using EXISTS, for definiteness): EXISTS x ( x is taller than y ) Monadic predicate... Invoke ("instantiate") with argument Steve: EXISTS x ( x is taller than Steve ) Proposition:TRUE iff there exists at least one person, say Arnold, taller than Steve

313 Copyright C. J. Date 2008page 312 ALTERNATIVELY : Quantify over both parameters (using EXISTS, again for definiteness): EXISTS x ( EXISTS y ( x is taller than y ) ) Proposition:TRUE iff there are at least two persons not of the same height Given an n-adic predicate, quantifying over m parameters (m < n) yields a k-adic predicate, where k = n - m EXISTS x ( EXISTS y ( x is taller than y ) ) EXISTS y ( EXISTS x ( x is taller than y ) ) Similarly for FORALL... Series of like quantifiers can be written in any sequence without changing semantics

314 Copyright C. J. Date 2008page 313 SIX POSSIBLE "FULL QUANTIFICATIONS" (and six distinct meanings) : Assuming at least two distinct persons: 1.EXISTS x EXISTS y ( x is taller than y ) Meaning: Somebody is taller than somebody else; TRUE, unless everybody is the same height 2.EXISTS x FORALL y ( x is taller than y ) Meaning: Somebody is taller than everybody; FALSE 3.FORALL x EXISTS y ( x is taller than y ) Meaning: Everybody is taller than somebody; FALSE

315 Copyright C. J. Date 2008page EXISTS y FORALL x ( x is taller than y ) Meaning: Somebody is shorter than everybody; FALSE /* But need to explain that predicates "x is taller*/ /* than y" and "y is shorter than x" are logically*/ /* equivalent!*/ 5.FORALL y EXISTS x ( x is taller than y ) Meaning: Everybody is shorter than somebody; FALSE 6.FORALL x FORALL y ( x is taller than y ) Meaning: Everybody is taller than everybody; FALSE

316 Copyright C. J. Date 2008page 315 LOGIC : FREE AND BOUND VARIABLES Recap:A free variable is just a parameter Quantifying over a free variable makes it bound E.g.: l x is taller than y/* x, y both free*/ l EXISTS x ( x is taller than y)/* x bound, y free*/ l EXISTS x EXISTS y ( x is taller than y) /* x, y both bound*/ So a proposition is a predicate with no free variables!

317 Copyright C. J. Date 2008page 316 THE TERMINOLOGY ISNT VERY GOOD : Free variables = parameters; but bound variables have no exact counterpart in conventional programming terms... They serve as a kind of dummy, linking the predicate inside the parens to the quantifier outside. E.g.: EXISTS x ( x > 3 ) vs.EXISTS y ( y > 3 ) By contrast, consider: EXISTS x ( x > 3 ) AND x < 0/* two different xs !!! */ EXISTS y ( y > 3 ) AND x < 0 EXISTS y ( y > 3 ) AND y < 0 "Free" and "bound" really apply to variable occurrences in expressions, not to variables as such... (sigh)

318 Copyright C. J. Date 2008page 317 EXERCISE (Honest Abe) : "You can fool some of the people some of the time, and some of the people all the time, but you cannot fool all the people all of the time." Is this statement unambiguous? What does it mean? Analysis: Statement involves three simple predicates (or propositions?) ANDed together: you can fool some of the people some of the time AND you can fool some of the people all the time AND /* but maps to AND */ you cannot fool all the people all of the time

319 Copyright C. J. Date 2008page 318 EXERCISE (cont.) : Denote "you can fool person x at time y" by fool(x,y) "You can fool some of the people some of the time": EXISTS x EXISTS y ( fool (x, y ) ) easy enough "You can fool some of the people all the time": FORALL y EXISTS x ( fool (x, y ) ) ??? EXISTS x FORALL y ( fool (x, y ) ) ??? "You cannot fool all the people all of the time": Ill leave this one to you!

320 Copyright C. J. Date 2008page 319 RELATIONAL CALCULUS : SNO and STATUS for suppliers in Paris who supply part P2: ( S WHERE CITY = Paris ) { SNO, STATUS } MATCHING ( SP WHERE PNO = P2 ) Relational calculus: RANGEVAR SX RANGES OVER S ; RANGEVAR SPX RANGES OVER SP; { SX.SNO, SX.STATUS } WHERE SX.CITY = Paris AND EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = P2 ) Generic form /* of rel calc exp per se */ : proto tuple WHERE predicate

321 Copyright C. J. Date 2008page 320 SQL ANALOG OF EXAMPLE : SELECTSX.SNO, SX.STATUS FROMS AS SX WHERESX.CITY = Paris ANDEXISTS (SELECT * FROM SP AS SPX WHERESPX.SNO = SX.SNO ANDSPX.PNO = P2 ) So SQL does support range variables /* see next page */ SQL also supports EXISTS, but indirectly: EXISTS sq gives TRUE if table denoted by sq nonempty, FALSE otherwise* /* sq usually "correlated" */ * Never UNKNOWN !!!

322 Copyright C. J. Date 2008page 321 SQL RANGE VARIABLES CAN BE IMPLICIT : SELECTS.SNO, S.STATUS FROMS /* implicit: AS S */ WHERES.CITY = Paris ANDEXISTS (SELECT * FROM SP /* implicit: AS SP */ WHERESP.SNO = S.SNO ANDSP.PNO = P2 ) "S." and "SP." do not refer to tables S and SP !!! they refer to implicit range variables (implicit correlation names, in SQL terms)

323 Copyright C. J. Date 2008page 322 MORE EXAMPLES : SNAMEs for suppliers who supply all parts /* range variable defns omitted */ : { SX.SNAME } WHERE FORALL PX ( EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) Quantifier order important! SQL analog ??? /* see later */ SNAMEs for suppliers who supply all red parts: { SX.SNAME } WHERE FORALL PX ( IF PX.COLOR = Red THEN EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )

324 Copyright C. J. Date 2008page 323 PRENEX NORMAL FORM : { SX.SNAME } WHERE FORALL PX ( EXISTS SPX ( IF PX.COLOR = Red THEN SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) A predicate is in prenex normal form (PNF) iff (a) its quantifier free or (b) its of the form EXISTS x (p) or FORALL x (p), where p is in PNF in turn: Q1 x1 ( Q2 x2 (... ( Qn xn ( q ) )... ) ) where n > 0, each Qi is either EXISTS or FORALL, and q is quantifier free PNF is no more correct than any other form, but often easiest to write

325 Copyright C. J. Date 2008page 324 MORE QUERIES : Pairs of SNOs where the suppliers are colocated: { SX.SNO AS SA, SY.SNO AS SB } WHERE SX.CITY = SY.CITY AND SX.SNO < SY.SNO SNAMEs for suppliers who dont supply part P2: { SX.SNAME } WHERE NOT EXISTS SPX (SPX.SNO = SX.SNO AND SPX.PNO = P2 ) For each shipment, shipment details, including total shipment weight: { SPX, PX.WEIGHT * SPX.QTY AS SHIPWT } WHERE PX.PNO = SPX.PNO

326 Copyright C. J. Date 2008page 325 For each part, PNO and total shipment quantity: { PX.PNO, SUM ( SPX WHERE SPX.PNO = PX.PNO, QTY ) AS TOTQ } [ WHERE TRUE ] Cities that store more than five red parts: { PX.CITY } WHERE COUNT ( PY WHERE PY.CITY = PX.CITY AND PY.COLOR = Red ) > 5

327 Copyright C. J. Date 2008page 326 CONSTRAINTS : STATUS must be in the range 1 to 100 inclusive: CONSTRAINT CX1 FORALL SX ( SX.STATUS > 0 AND SX.STATUS < 101 ) ; SQL base table constraint (on base table S): CONSTRAINT CX1 CHECK ( STATUS > 0 AND STATUS < 101 ) Elides the quantifier (and explicit range variable) Suppliers in London must have status 20: CONSTRAINT CX2 FORALL SX ( IF SX.CITY = London THEN SX.STATUS = 20 ) ;

328 Copyright C. J. Date 2008page 327 No two suppliers have same SNO: CONSTRAINT CX3 FORALL SX ( FORALL SY ( IFSX.SNO = SY.SNO THEN SX.SNAME = SY.SNAME AND SX.STATUS = SY.STATUS AND SX.CITY = SY.CITY ) ) ; No supplier with status less than 20 can supply part P6: CONSTRAINT CX5 FORALL SX ( IF SX.STATUS < 20 THEN NOT EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = P6 ) ) ;

329 Copyright C. J. Date 2008page 328 Every SNO in SP must appear in S: CONSTRAINT CX6 FORALL SPX ( EXISTS SX ( SX.SNO = SPX.SNO ) ) ; /* more on this one later */ No SNO appears in both LS and NLS: CONSTRAINT CX7 FORALL LX ( FORALL NX ( LX.SNO NX.SNO ) ) ; There must always be at least one supplier: CONSTRAINT CX9 EXISTS SX ( TRUE ) ;

330 Copyright C. J. Date 2008page 329 MORE ON THE QUANTIFIERS : 1. WE DONT NEED BOTH EXISTS x ( x is taller than Steve ) NOT FORALL x ( NOT x is taller than Steve ) Say the same thing! More generally: EXISTS x ( p ( x ) ) NOT FORALL x ( NOT p ( x ) ) Likewise: FORALL x ( p ( x ) ) NOT EXISTS x ( NOT p ( x ) ) So we dont need both... but its nice to have both. E.g.:

331 Copyright C. J. Date 2008page 330 "GET SUPPLIERS WHO SUPPLY ALL PARTS" : Compare and contrast: SX WHEREFORALL PX ( EXISTS SPX ( SX.SNO = SPX.SNO AND SPX.PNO = PX.PNO ) vs.SELECTSX.* FROMS AS SX WHERENOT EXISTS ( SELECTPX.* FROMP AS PX WHERENOT EXISTS (SELECTSPX.* FROMSP AS SPX WHERESX.SNO = SPX.SNO AND SPX.PNO = PX. PNO ) )

332 Copyright C. J. Date 2008page 331 MORE ON THE QUANTIFIERS : 2. EMPTY RANGES EXISTS x ( p ( x ) ) NOT FORALL x ( NOT p ( x ) ) Suppose there are no xs; then LHS evaluates to FALSE l So RHS evaluates to FALSE l So FORALL x ( NOT p ( x ) ) evaluates to TRUE l But p was arbitrary... l So FORALL x ( q ( x ) ) evaluates to TRUE: regardless of the predicate q(x) !

333 Copyright C. J. Date 2008page 332 SOME CONSEQUENCES : l Business rule or constraint of the form FORALL x (...) is "automatically" satisfied if there arent any xs. E.g., "all taxpayers with taxable income > $1 billion must pay supertax" automatically satisfied if no taxpayer has such a large taxable income l Certain queries produce "unexpected" results (if you dont know logic). E.g., "get suppliers who supply all purple parts" SX WHERE FORALL PX ( IF PX.COLOR = Purple THEN EXISTS SPX ( SX.SNO = SPX.SNO AND SPX.PNO = PX.PNO ) ) returns all suppliers if there are no purple parts (!)

334 Copyright C. J. Date 2008page 333 MORE ON THE QUANTIFIERS : 3. DEFINITIONS Consider p(x); let x range over {x1,x2,...,xn}. Then: EXISTS x ( p ( x ) ) FALSE OR p ( x1 ) OR p ( x2 ) OR... OR p ( xn ) FORALL x ( p ( x ) ) TRUE AND p ( x1 ) AND p ( x2 ) AND... AND p ( xn ) E.g.:let p(x) = x has a moon; let x range over {Mercury, Venus, Earth, Mars} But foregoing definitions are valid only because the sets are all finite! (And even though the quantifiers are thus "just shorthand," theyre very useful shorthand!)

335 Copyright C. J. Date 2008page 334 MORE ON THE QUANTIFIERS : 4. ADDITIONAL KINDS Possibilities include: l There exist at least three xs such that l A majority of xs are such that l An odd number of xs are such that and so on... One important one: l There exists exactly one x such that ("UNIQUE") E.g.: UNIQUE x ( x has social security number y ) Meaning: Exactly one person has social security number y

336 Copyright C. J. Date 2008page 335 CONSTRAINT CX6 REVISITED : Every shipment must have a supplier: CONSTRAINT CX6 FORALL SPX ( EXISTS SX ( SX.SNO = SPX.SNO ) ) ; Better: CONSTRAINT CX6 FORALL SPX ( UNIQUE SX ( SX.SNO = SPX.SNO ) ) ; SQL has very indirect support: UNIQUE sq where sq is (SELECT * FROM T WHERE bx) gives TRUE if at most one row in T satisfies bx, else FALSE So CX6 becomes:

337 Copyright C. J. Date 2008page 336 CREATE ASSERTION CX6 CHECK ( NOT EXISTS (SELECT * FROMSP AS SPX WHERENOT EXISTS (SELECT* FROMS AS SX WHERESX.SNO = SPX.SNO ) ORNOT UNIQUE (SELECT* FROMS AS SX WHERE SX.SNO = SPX.SNO ) ) ) ; /* but "OR... (...)" could be dropped*/ /* because (SNO) is key for S*/

338 Copyright C. J. Date 2008page 337 SOME EQUIVALENCES : If IS_EMPTY supported, quantifiers need not be: EXISTSx ( p ) NOT ( IS_EMPTY ( X WHERE p ) ) FORALLx ( p ) IS_EMPTY ( X WHERE NOT ( p ) ) /* x ranges over X */ These equivalences explain SQLs EXISTS (which is really an operator, not a quantifier, in SQL)... and SQLs lack of support for FORALL EXISTSx ( p ) COUNT ( X WHERE p ) > 0 FORALLx ( p ) COUNT ( X WHERE p ) = COUNT ( X ) UNIQUEx ( p ) COUNT ( X WHERE p ) = 1 Recommendation: Dont use COUNT in preference to EXISTS

339 Copyright C. J. Date 2008page 338 RELATIONAL COMPLETENESS : For every expression of the rel algebra, there exists an expression of the rel calculus thats logically equivalent (i.e., has same semantics)... So rel calculus is at least as powerful (better: expressive) as rel algebra Not obvious (?), but converse is true too Both are relationally complete /* basic measure of expressive power of lang */ What about SQL ???

340 Copyright C. J. Date 2008page 339 TO SUM UP : DB professionals in general and SQL practitioners in particular should have at least a basic understanding of logic or relational calculus (it comes to the same thing) !!! Heres a quote: Surely its worth investing a little effort up front in becoming familiar with [basic logic] in order to avoid the problems associated with ambiguous business rules. Ambiguity in business rules leads to implementation delays at best or implementation errors at worst (possibly both). And such delays and errors certainly have costs associated with them, costs that are likely to outweigh those initial learning costs many times over. In other words, framing business rules properly is a serious matter, and it requires a certain level of technical competence.

341 Copyright C. J. Date 2008page 340 These remarks are set in the context of business rules specifically, but theyre of wider applicabilityas well see Yes, I know the counterarguments... but I dont agree with them Reviewer:"Counterarguments to what? Surely not to the assertion that it would be better if the rule designer were trained in logic? If so, Id like to be told them, and perhaps some others would feel the same."

342 Copyright C. J. Date 2008page 341 Yes, thats what I meant... Claim is: Logic is simply too difficult for most people to deal with Might be true in general (big subject!)... but dont need to understand the whole of logic for the purpose at hand... and the benefits are so huge! Small effort up front pays for itself many times over in avoiding errors in rules, and constraints, and queries, and on

343 Copyright C. J. Date 2008page 342 A FINAL REMARK : Logic is very solid !!! Began with the ancient Greeks: Aristotle BCE Leibniz : Laid foundations of modern logic Boole : Laws of Thought (1854) Frege : Quantifiers (1879) Wittgenstein : Truth tables (1922) Etc., etc., etc.

344 Copyright C. J. Date 2008page Setting the scene 8.SQL and constraints 2. Types and domains 9.SQL and views 3.Tuples and relations, 10.SQL and logic I: rows and tablesRelational calculus 4.No duplicates, no nulls11.SQL and logic II: Using logic to write SQL 5.Base relvars, base tables 12.Further SQL topics 6.SQL and algebra I: The original operators13.Appendix: The relational model 7.SQL and algebra II: Additional operators14.Appendix: DB design STRUCTURE OF PRESENTATION :

345 Copyright C. J. Date 2008page 344 HOW TO WRITE CORRECT SQL AND KNOW IT : SQL is complicated and difficultmuch more so than SQL advocates would have you believe... In fact, its unteachable !!! (so my title might be an overclaim) So to have a hope of writing correct SQL, you must follow some discipline Logic is a HUGE help! n Formulate query (or...) in logic or rel calc n Map that formulation systematically to SQL In other words, expression transformation once again

346 Copyright C. J. Date 2008page 345 SOME IMPORTANT TRANSFORMATION LAWS : Law of the form exp1 exp2 implies that if some exp contains an occurrence of exp1, it can be rewritten as an exp containing an occurrence of exp2 without changing the meaning /* crucial point */... E.g. SELECTSNO FROMS WHERE(STATUS > 10 AND CITY = London ) OR(STATUS > 10 AND CITY = Athens ) Boolean exp here clearly equivalent to: STATUS > 10 AND ( CITY = London OR CITY = Athens ) Thanks to distributivity (of AND over OR)

347 Copyright C. J. Date 2008page 346 n The distributive laws: p AND ( q OR r ) ( p AND q ) OR ( p AND r ) p OR ( q AND r ) ( p OR q ) AND ( p OR r ) Here and elsewhere p, q, r denote arb boolean exps

348 Copyright C. J. Date 2008page 347 n The implication law: IF p THEN q ( NOT p ) OR q n The double negation law: NOT ( NOT p ) p n De Morgans laws: NOT (p AND q ) ( NOT p ) OR ( NOT q ) NOT (p OR q ) ( NOT p ) AND ( NOT q )

349 Copyright C. J. Date 2008page 348 n The quantification law: FORALL x ( p ( x ) ) NOT EXISTS x ( NOT p ( x ) ) /* repeated application of De Morgan */ n De Morgans "first" law revisited: NOT (p AND q ) ( NOT p ) OR ( NOT q ) Often applied to result of prior application of implication law... So restate, replacing q by NOT q: NOT (p AND NOT q ) ( NOT p ) OR q

350 Copyright C. J. Date 2008page 349 EXAMPLE 1: LOGICAL IMPLICATION All red parts must be stored in London... i.e.: IF COLOR = Red THEN CITY = London /* for given part */ Apply implication law /* add parens for clarity */ : ( NOT ( COLOR = Red ) ) OR CITY = London Map to base table constraint (SQL): CONSTRAINT BTCX1 CHECK ( NOT ( COLOR = Red ) OR CITY = London ) Simplify /* i.e., more transformations! */ : CONSTRAINT BTCX1 CHECK ( COLOR <> Red OR CITY = London )

351 Copyright C. J. Date 2008page 350 EXAMPLE 2: UNIVERSAL QUANTIFICATION FORALL PX ( IF COLOR = Red THEN PX.CITY = London ) Apply quantification law: NOT EXISTS PX ( NOT ( IF PX.COLOR = Red THEN PX.CITY = London ) ) /* henceforth add/drop parens freely */ Implication law: NOT EXISTS PX ( NOT ( NOT ( PX.COLOR = Red ) OR PX.CITY = London ) ) Could now map to SQL, but lets tidy it up first:

352 Copyright C. J. Date 2008page 351 De Morgan: NOT EXISTS PX ( NOT ( NOT ( ( PX.COLOR = Red ) AND NOT ( PX.CITY = London ) ) ) ) Double negation (and drop some parens): NOT EXISTS PX ( PX.COLOR = Red AND NOT ( PX.CITY = London ) ) One more obvious transformation: NOT EXISTS PX ( PX.COLOR = Red AND PX.CITY London )

353 Copyright C. J. Date 2008page 352 TRANSFORM FINAL EXP TO SQL : NOT maps to NOT EXISTS PX ( bx ) EXISTS ( SELECT * FROMP AS PX WHERE( sbx ) ) /* sbx is SQL analog of bx */ n Parens around sbx can be dropped n Wrap up entire exp inside CREATE ASSERTION CREATE ASSERTION... CHECK ( NOT EXISTS ( SELECT* FROMP AS PX WHEREPX.COLOR = Red ANDPX.CITY <> London ) ) ;

354 Copyright C. J. Date 2008page 353 EXAMPLE 3: IMPLIES AND FORALL PNAMEs for parts whose weight is different from that of every part in Paris: { PX.PNAME } WHERE FORALL PY ( IF PY.CITY = Paris THEN PY.WEIGHT PX.WEIGHT ) Quantification law: { PX.PNAME } WHERE NOT EXISTS PY ( NOT ( IF PY.CITY = Paris THEN PY.WEIGHT PX.WEIGHT ) ) Implication law: { PX.PNAME } WHERE NOT EXISTS PY ( NOT ( NOT ( PY.CITY = Paris ) OR ( PY.WEIGHT PX.WEIGHT ) ) )

355 Copyright C. J. Date 2008page 354 De Morgan: { PX.PNAME } WHERE NOT EXISTS PY ( NOT ( NOT ( ( PY.CITY = Paris ) AND NOT ( PY.WEIGHT PX.WEIGHT ) ) ) ) Tidy up: { PX.PNAME } WHERE NOT EXISTS PY ( PY.CITY = Paris AND PY.WEIGHT = PX.WEIGHT ) Map to SQL:

356 Copyright C. J. Date 2008page 355 SELECTDISTINCT PX.PNAME /* DISTINCT needed here! */ FROMP AS PX WHERENOT EXISTS (SELECT * FROMP AS PY WHEREPY.CITY = Paris ANDPY.WEIGHT = PX.WEIGHT ) But... suppose theres at least one part in Paris, but such parts all have a null weight Original query now cant be answered... Any definite result is a lie! But foregoing SQL exp will return all PNAMEs in table P

357 Copyright C. J. Date 2008page 356 SELECTDISTINCT PX.PNAME FROMP AS PX WHEREPX.WEIGHT NOT IN ( SELECT PY.WEIGHT FROMP AS PY WHEREPY.CITY = Paris ) Looks equivalent... Is equivalent in 2VL... But gives different but equally incorrect result: viz., empty table! (under same conditions as before) Moral ??? WHATS MORE :

358 Copyright C. J. Date 2008page 357 Names of suppliers who supply both part P1 and part P2: { SX.SNAME } WHERE EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = P1 ) AND EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = P2 ) SELECTDISTINCT SX.SNAME FROMS AS SX WHEREEXISTS (SELECT * FROMSP AS SPX WHERESPX.SNO = SX.SNO ANDSPX.PNO = P1 ) ANDEXISTS (SELECT * FROMSP AS SPX WHERESPX.SNO = SX.SNO ANDSPX.PNO = P2 ) EXAMPLE 4: CORRELATED SUBQUERIES

359 Copyright C. J. Date 2008page 358 Correlated subqueries often contraindicated from a performance point of view,* because (conceptually, at least) they have to be evaluated once for each row in the outer table, instead of just once and for all So eliminate them?... Easy (for subqueries in EXISTS): SELECTDISTINCT SX.SNAME FROMS AS SX WHERESX.SNO IN(SELECT SPX.SNO FROMSP AS SPX WHERESPX.PNO = P1 ) ANDSX.SNO IN(SELECT SPX.SNO FROMSP AS SPX WHERESPX.PNO = P2 ) * Mirabile dictu...

360 Copyright C. J. Date 2008page 359 SELECTsic /* "select item commalist" */ FROMT1 WHERE[ NOT ] EXISTS (SELECT * FROMT2 WHERET2.C = T1.C ANDbx ) Maps to: SELECTsic FROMT1 WHERET1.C [ NOT ] IN (SELECT T2.C FROMT2 WHEREbx ) But what if there are nulls?

361 Copyright C. J. Date 2008page 360 EXAMPLE 5: NAMING SUBEXPRESSIONS Get supplier details for suppliers who supply all purple parts { SX } WHERE FORALL PX ( IF PX.COLOR = Purple THEN EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) Implication law: { SX } WHERE FORALL PX ( NOT ( PX.COLOR = Purple ) OR EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) De Morgan: { SX } WHERE FORALL PX ( NOT ( PX.COLOR = Purple ) AND NOT EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) )

362 Copyright C. J. Date 2008page 361 Quantification law: { SX } WHERE NOT EXISTS PX (NOT ( NOT ( ( PX.COLOR = Purple ) AND NOT EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) ) ) Double negation: { SX } WHERE NOT EXISTS PX ( ( PX.COLOR = Purple ) AND NOT EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )

363 Copyright C. J. Date 2008page 362 Drop some parens and map to SQL: SELECT* FROMS AS SX WHERENOT EXISTS (SELECT * FROMP AS PX WHEREPX.COLOR = Purple ANDNOT EXISTS (SELECT* FROMSP AS SPX WHERESPX.SNO = SX.SNO ANDSPX.PNO = PX.PNO ) )

364 Copyright C. J. Date 2008page 363 A BETTER APPROACH : Introduce names for subexpressions: exp1 : PX.COLOR = Purple exp2 : EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) /* both map fairly directly to SQL */ Original rel calc formulation: { SX } WHERE FORALL PX ( IF exp1 THEN exp2 ) Can see the forest as well as the trees!... and can apply usual transformationsbut in a different sequence, because we now have better grasp of the big picture

365 Copyright C. J. Date 2008page 364 Quantification law: { SX } WHERE NOT EXISTS PX ( NOT ( IF exp1 THEN exp2 ) ) Implication law: { SX } WHERE NOT EXISTS PX ( NOT ( NOT ( exp1 ) OR ( exp2 ) ) De Morgan: { SX } WHERE NOT EXISTS PX ( NOT ( NOT ( exp1 AND NOT exp2 ) ) ) Double negation: { SX } WHERE NOT EXISTS PX ( exp1 AND NOT ( exp2 ) ) Can now expand exp1 and exp2 and map to SQL

366 Copyright C. J. Date 2008page 365 Get suppliers such that every part they supply is in the same city as that supplier { SX } WHERE FORALL PX (IF EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) THEN PX.CITY = SX.CITY ) { SX } WHERE FORALL PX ( IF exp1 THEN exp2 ) { SX } WHERE NOT EXISTS PX ( NOT ( IF exp1 THEN exp2 ) ) { SX } WHERE NOT EXISTS PX ( NOT ( NOT ( exp1 ) OR exp2 ) ) { SX } WHERE NOT EXISTS PX ( NOT ( NOT ( exp1 AND NOT ( exp2 ) ) ) ) { SX } WHERE NOT EXISTS PX ( exp1 AND NOT ( exp2 ) ) EXAMPLE 6: NAMING SUBEXPRESSIONS bis

367 Copyright C. J. Date 2008page 366 Expand exp1 and exp2 and map to SQL: SELECT* FROMS AS SX WHERENOT EXISTS (SELECT * FROMP AS PX WHEREEXISTS (SELECT* FROMSP AS SPX WHERESPX.SNO = SX.SNO ANDSPX.PNO = PX.PNO ) ANDPX.CITY <> SX.CITY )

368 Copyright C. J. Date 2008page 367 Get suppliers such that every part they supply is in the same city Possible interpretations include: l Get suppliers SX such that for all parts PX and PY, if SX supplies both of them, then PX.CITY = PY.CITY l Get suppliers SX such that for all parts PX and PY, if SX supplies both of them and theyre distinct, then PX.CITY = PY.CITY Assume first interpretation... EXAMPLE 7: DEALING WITH AMBIGUITY

369 Copyright C. J. Date 2008page 368 { SX } WHERE FORALL PX ( FORALL PY ( IF EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) AND EXISTS SPY ( SPY.SNO = SX.SNO AND SPY.PNO = PY.PNO ) THEN PX.CITY = PY.CITY ) ) { SX } WHERE FORALL PX ( FORALL PY ( IF exp1 AND exp2 THEN exp3 ) ) { SX } WHERE NOT EXISTS PX ( NOT FORALL PY ( IF exp1 AND exp2 THEN exp3 ) ) { SX } WHERE NOT EXISTS PX ( NOT ( NOT EXISTS PY ( NOT ( IF exp1 AND exp2 THEN exp3 ) ) ) )

370 Copyright C. J. Date 2008page 369 { SX } WHERE NOT EXISTS PX ( EXISTS PY ( NOT ( IF exp1 AND exp2 THEN exp3 ) ) ) { SX } WHERE NOT EXISTS PX ( EXISTS PY ( NOT ( NOT ( exp1 AND exp2 ) OR exp3 ) ) ) { SX } WHERE NOT EXISTS PX ( EXISTS PY ( NOT ( NOT ( exp1 ) OR NOT ( exp2 ) OR ( exp3 ) ) ) { SX } WHERE NOT EXISTS PX ( EXISTS PY ( ( exp1 AND exp2 AND NOT ( exp3 ) ) ) )

371 Copyright C. J. Date 2008page 370 SELECT* FROMS AS SX WHERENOT EXISTS (SELECT * FROMP AS PX WHEREEXISTS (SELECT* FROMP AS PY WHEREEXISTS (SELECT* FROMSP AS SPX WHERESPX.SNO = SX.SNO ANDSPX.PNO = PX.PNO ) ANDEXISTS (SELECT* FROMSP AS SPY WHERESPY.SNO = SX.SNO ANDSPY.PNO = PY.PNO ) ANDPX.CITY <> PY.CITY ) )

372 Copyright C. J. Date 2008page 371 Get suppliers such that every part they supply is in the same city /* same as Example 7 */... Or: l Get suppliers SX such that the number of cities for parts supplied by SX is less than or equal to one { SX } WHERE COUNT ( PX.CITY WHERE EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) < 1 SELECT* FROM S AS SX WHERE(SELECT COUNT ( DISTINCT PX.CITY ) FROMP AS PX WHEREEXISTS ( SELECT* FROM SP AS SPX WHERESPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) <=1 EXAMPLE 8: USING COUNT

373 Copyright C. J. Date 2008page 372 l Reminder: Dont use COUNT when EXISTS is what you mean l Is that DISTINCT in the COUNT invocation necessary? l Can you formulate the query in terms of GROUP BY and HAVING? l If so, what are the logical steps involved in constructing that formulation?

374 Copyright C. J. Date 2008page 373 E.g., P.WEIGHT >ALL ( SELECT... ) rx theta sq subquery, denoting table t =, <, (etc.) followed by ALL or ANY row expression (usually scalar in practice: coercion) ALL : TRUE iff comparison without ALL returns TRUE for all rows in t (hence, TRUE if t empty) ANY : TRUE iff comparison without ANY returns TRUE for at least one row in t (hence, FALSE if t empty) EXAMPLE 11*: ALL OR ANY COMPARISON * For Examples 9 and 10, see the book

375 Copyright C. J. Date 2008page 374 PNAMEs for parts with weight > that of every blue part: SELECTDISTINCT PX.PNAME FROMP AS PX WHEREPX.WEIGHT >ALL (SELECT PY.WEIGHT FROM P AS PY WHERE PY.COLOR = Blue ) Recommendation: Dont use ALL or ANY comparisons! Error prone (e.g., replace "every" by "any" in example?) Redundant... e.g., consider: SELECTDISTINCT SNAME FROMS WHERE CITY <>ANY ( SELECT CITY FROM P )

376 Copyright C. J. Date 2008page 375 SNAMEs for suppliers whose city isnt equal to any part city? Wrong! Actually equivalent* to: SELECTDISTINCTSNAME FROMS WHEREEXISTS (SELECT* FROMP WHEREP.CITY <> S.CITY ) ALL or ANY comparisons can always be transformed into equivalent exps involving EXISTS (as above)... Can also usually be transformed into exps involving MAX or MIN * Is it? What if cities could be null?

377 Copyright C. J. Date 2008page 376 =ANY equivalent to IN <>ALL equivalent to NOT IN =ALL, <>ANY... Use EXISTS ANY ALL =IN <> NOT IN < < MAX< MIN <=<=MAX<=MIN >> MIN> MAX >=>=MIN>=MAX

378 Copyright C. J. Date 2008page SELECTDISTINCT PX.PNAME FROMP AS PX WHEREPX.WEIGHT >ALL ( SELECT PY.WEIGHT FROMP AS PY WHEREPY.COLOR = Blue ) 2. SELECTDISTINCT PX.PNAME FROMP AS PX WHEREPX.WEIGHT > ( SELECT MAX ( PY.WEIGHT ) FROMP AS PY WHEREPY.COLOR = Blue ) Exercise: What coercions are involved in the above? FOR EXAMPLE :

379 Copyright C. J. Date 2008page 378 MAX gives null if argument is empty SELECTDISTINCT PX.PNAME FROMP AS PX WHEREPX.WEIGHT >ALL ( SELECT PY.WEIGHT FROMP AS PY WHEREPY.COLOR = Blue ) 2. SELECTDISTINCT PX.PNAME FROMP AS PX WHEREPX.WEIGHT > ( SELECT MAX ( PY.WEIGHT ) FROMP AS PY WHEREPY.COLOR = Blue ) No blue parts: Exp 1 gives all PNAMEs... Exp 2 gives empty !!! BUT :

380 Copyright C. J. Date 2008page SELECTDISTINCT PX.PNAME FROMP AS PX WHEREPX.WEIGHT > ( SELECT COALESCE ( MAX ( PY.WEIGHT ), 0.0 ) FROMP AS PY WHEREPY.COLOR = Blue )

381 Copyright C. J. Date 2008page 380 For each part supplied by no more than two suppliers, get PNAME and city and total quantity supplied { PX.PNO, PX.CITY, SUM ( SPX.QTY WHERE SPX.PNO = PX.PNO, QTY ) AS TPQ } WHERE COUNT ( SPY WHERE SPY.PNO = PX.PNO ) < 2 SELECT PX.PNO, PX.CITY, (SELECT COALESCE ( SUM ( SPX.QTY ), 0 ) AS TPQ FROMSP AS SPX WHERESPX.PNO = PX.PNO ) AS TPQ FROMP AS PX WHERE( SELECT COUNT ( * ) FROMSP AS SPY WHERESPY.PNO = PX.PNO ) <= 2 EXAMPLE 12: GROUP BY AND HAVING

382 Copyright C. J. Date 2008page 381 SELECT PX.PNO, PX.CITY, COALESCE ( SUM ( SPX.QTY ), 0 ) AS TPQ FROMP AS PX, SP AS SPX WHEREPX.PNO = SPX.PNO GROUPBY PX.PNO HAVINGCOUNT ( * ) <= 2 Easier to understand? Is PX.CITY in SELECT clause legal? Correct for parts supplied by no suppliers at all? /* No */ Are formulations equivalent in presence of nulls? Or duplicates? OR :

383 Copyright C. J. Date 2008page Setting the scene 8.SQL and constraints 2. Types and domains 9.SQL and views 3.Tuples and relations, 10.SQL and logic I: rows and tablesRelational calculus 4.No duplicates, no nulls11.SQL and logic II: Using logic to write SQL 5.Base relvars, base tables 12.Further SQL topics 6.SQL and algebra I: The original operators13.Appendix: The relational model 7.SQL and algebra II: Additional operators14.Appendix: DB design STRUCTURE OF PRESENTATION :

384 Copyright C. J. Date 2008page 383 Implementation defined vs. implementation dependent SELECT * Explicit tables Dot qualification Range variables FURTHER SQL TOPICS : Subqueries "Possibly nondeterministic" expressions Empty sets BNF grammar for SQL table expressions

385 Copyright C. J. Date 2008page Setting the scene 8.SQL and constraints 2. Types and domains 9.SQL and views 3.Tuples and relations, 10.SQL and logic I: rows and tablesRelational calculus 4.No duplicates, no nulls11.SQL and logic II: Using logic to write SQL 5.Base relvars, base tables 12.Further SQL topics 6.SQL and algebra I: The original operators13.Appendix: The relational model 7.SQL and algebra II: Additional operators14.Appendix: DB design THANK YOU FOR LISTENING !!!

386 Copyright C. J. Date 2008page 385 THESIS (stake in the ground) : DBs are not just "data stores" !!! I claim, if you think about the issue at the approp level of abstraction, youre inexorably led to the position: DBs must be relational All other "models"inverted lists, IMS-style hierarchies, CODASYL-style networks, objects (= CODASYL warmed over), XML or "semistructured model" (= IMS warmed over), etc., etc.are simply ad hoc storage structures that have been elevated above their station and will not endure

387 Copyright C. J. Date 2008page 386 JUSTIFICATION : Want to record "true facts": e.g., Joes salary is 50K... i.e., true propositions Easily encoded as ordered pairs: e.g., value of type NAME value of type MONEY But not just arbitrary propositions... Rather, all true instantiations of certain predicates... In the example: xs salary is y value of type NAME value of type MONEY

388 Copyright C. J. Date 2008page 387 JUSTIFICATION (cont.) : In other words, we want to record extension of "xs salary is y"i.e., a set of ordered pairsi.e., a binary relation!... which we can depict as a table: values of type NAME values of type MONEY Joe50K Amy60K Actually a function, Sue45K because each person has just one salary Ron60K Subset of cartesian product of set of all names ("type NAME") and set of all money values ("type MONEY"), in that order

389 Copyright C. J. Date 2008page 388 Humble (but very solid) beginnings! But Codd realized: 1.Need n-adic predicates and propositions (not just dyadic); hence n-ary relations (not just binary) and n-tuples (not just pairs)tuples for short 2. Ordering OK for pairs but soon gets unwieldy for n > 2... So replace attribute ordinal positions by attribute names and (re)define relation concept accordingly 3.Representation obviously not the end of the story... Need operators for deriving further relations from given ("base") ones for queries etc.e.g., "Find all persons with salary 60K"... Hence relational calculus (logic) / relational algebra (set theory) JUSTIFICATION (cont.) :

390 Copyright C. J. Date 2008page 389 EXAMPLE REVISITED : attribute of type NAME attribute of type MONEY headingPERSONSALARYNo "first" or "second" attribute Joe50K Amy60KNote logical difference bodySue45Kbetween attribute and......underlying type Ron60K From this point forward relation means a relation in above sense, barring explicit statements to the contrary

391 Copyright C. J. Date 2008page 390 THE RELATIONAL MODEL DEFINED : 1. An open-ended collection of scalar types (including in particular the type boolean or truth value) 2. A relation type generator and an intended interpretation for relations of types generated thereby 3. Facilities for defining relation variables of such generated relation types 4. A relational assignment operation for assigning relation values to such relation variables 5. An open-ended collection of generic relational operators for deriving relation values from other relation values

392 Copyright C. J. Date 2008page 391 SOME IMPLICATIONS : 1. User defined types and user defined operators 2. Users can specify individual relation types 3. Relvars are the only variables allowed inside an RDB in accordance with Codd's Information Principle: Entire information content of the DB is represented in one and only one way: as explicit attribute values within tuples within relations 4. INSERT / DELETE / UPDATE just shorthand 5. System defined operators (plus user-defined ones?) used for many purposes, including constraints in particular

393 Copyright C. J. Date 2008page 392 WHAT REMAINS TO BE DONE ??? Proper implementation The Third Manifesto The TransRelational tm Model Further foundation issues: e.g., Constraint inference Database design "Missing information" Etc.

394 Copyright C. J. Date 2008page 393 Higher level abstractions PACK and UNPACK "U_" ops, keys, etc. Etc. Higher level interfaces Propositions Data mining, decision support, etc. What about SQL ???

395 Copyright C. J. Date 2008page Setting the scene 8.SQL and constraints 2. Types and domains 9.SQL and views 3.Tuples and relations, 10.SQL and logic I: rows and tablesRelational calculus 4.No duplicates, no nulls11.SQL and logic II: Using logic to write SQL 5.Base relvars, base tables 12.Further SQL topics 6.SQL and algebra I: The original operators13.Appendix: The relational model 7.SQL and algebra II: Additional operators14.Appendix: DB design STRUCTURE OF PRESENTATION :

396 Copyright C. J. Date 2008page 395 SOME REMARKS ON DATABASE DESIGN : DB design theory is not part of RM as suchrather, it builds on RM Obviously true for physical design!but true of logical design too, to some extent Concepts such as further normalization on which design theory is based are themselves based on more fundamental concepts that are part of RM So I'll be brief... Quick look at: Normalization Denormalization Orthogonality

397 Copyright C. J. Date 2008page 396 FUNCTIONAL DEPENDENCIES : "Everyone knows" that 2NF, 3NF, BCNF all depend on functional dependencies (FDs) Let A and B be subsets of the heading of R; then R satisfies the FD A B iff, whenever two tuples of R agree on A, they also agree on B E.g., given EMP { ENO, SALARY, DNO, MNO } : { DNO } { MNO }

398 Copyright C. J. Date 2008page 397 Reminder: If SK is a superkey for R and A is any subset of the heading of R, then R satisfies SK A The fact that a given FD holds for R is a relvar constraint on R (of course): e.g., for EMP as on previous page, CONSTRAINT FDX COUNT ( EMP { DNO } ) = COUNT ( EMP { DNO, MNO } ) ; Likewise for multi-valued dependencies (MVDs), which are relevant to "4NF", and join dependencies (JDs), which are relevant to "5NF" (CONSTRAINT formulations left as an exercise)

399 Copyright C. J. Date 2008page 398 NORMAL FORMS : 1NF : All relvars are in 1NFeven with relation-valued attributes (RVAs)though RVAs usually contraindicated 2NF, 3NF : Mainly historical interest BCNF : R is in BCNF iff for every nontrivial FD X A satisfied by R, X is a superkey Loosely: Every fact is a fact about the key, the whole key, and nothing but the key (The FD A B is trivial iff it can't possibly be violated i.e., iff B A) 4NF : Mainly historical interest

400 Copyright C. J. Date 2008page 399 JOIN DEPENDENCIES : Let R be a relvar, and let A, B,..., C be subsets of the heading of R. Then R satisfies the join dependence * { A, B, …, C } if and only if every legal value of R is equal to the join of its projections on A, B,..., C (i.e., if and only if R can be nonloss decomposed into those projections) E.g.: Relvar S satisfies JD * { SN, SS, SC } where SN = { SNO, SNAME }, etc. Note: UNION { A, B, …, Z } must equal heading Every MVD is a JD, every FD is an MVD

401 Copyright C. J. Date 2008page 400 EVERY FD IS A JD (example) : Suppose relvar S satisfies additional FD: { CITY } { STATUS } /* see next page */ Then S can be nonloss decomposed into projections on: { SNO, SNAME, CITY } { CITY, STATUS } In other words, S satisfies following JD: * { SNC, CS } where SNC= { SNO, SNAME, CITY } CS = { CITY, STATUS }

402 Copyright C. J. Date 2008page 401 SAMPLE VALUE OF RELVAR S SATISFYING { CITY } { STATUS } : S SNO SNAME STATUS CITY S1 Smith 20 London S2 Jones 30 Paris note the S3 Blake 30 Paris change S4 Clark 20 London S5 Adams 30 Athens

403 Copyright C. J. Date 2008page 402 NONLOSS DECOMPOSE : SNC SNO SNAME CITY S1 Smith London S2 JonesParis S3 BlakeParis S4 Clark London S5 Adams Athens CS CITYSTATUS Athens 30 London20 Paris30 S SNC JOIN CS... In other words, S satisfies * { { SNO, SNAME, CITY }, { CITY, STATUS } }

404 Copyright C. J. Date 2008page 403 NORMAL FORMS (cont.) : 5NF : The "final" normal form!*R is in 5NF iff, for every nontrivial JD * {A,B,...,C} satisfied by R, each of A,B,...,C is a superkey [and keys can be ordered such that each adjacent pair is included in at least one of A, B,..., C] The JD * {A,B,...,C} is trivial iff at least one of A, B,..., C = heading Theorem (Date & Fagin 1991): 3NF and no composite keys implies 5NF And another: BCNF and not all key implies 5NF * Well.... except for 6NF

405 Copyright C. J. Date 2008page 404 NORMAL FORMS (cont.) : 6NF : The true final normal formR is in 6NF iff the only JDs it satisfies are trivial ones E.g., SP (but not S or P) R is in 6NF iff the only JDs it satisfies are of the form * {...,{H},...}, where {H} is the heading R is in 6NF iff its in 5NF, is of degree n, and has no key of degree less than n NF implies 5NF E.g., PLUS{A,B,C} : 6NF (every key is of degree two) Note: 6NF has extended defn in temporal DB context

406 Copyright C. J. Date 2008page 405 OBJECTIVES OF NORMALIZATION : Reduce redundancy Avoid update anomalies "Better" representation of semantics Easier enforcement of constraints (normalization to 5NF gives us a simple way of enforcing certain important and commonly occurring constraints) Only need to enforce KEY UNIQUENES All JDs (and so all MVDs and all FDs) will then be enforced automatically

407 Copyright C. J. Date 2008page 406 SOME REASONS WHY NORMALIZATION IS NOT A PANACEA : Enforces certain constraints very simply, but JDs etc. are not the ONLY kind of constraint Decomposition is not unique, in general Not all redundancies can be removed by taking projections BCNF and "dependency preservation" objectives can conflict In fact, normalization can cause some FDs (etc.) to cease to be FDs (etc.), since they now span relvars! Some design issues are simply not addressed Nevertheless... DENORMALIZE ONLY AS A LAST RESORT !!!

408 Copyright C. J. Date 2008page 407 DENORMALIZATION CONSIDERED HARMFUL : Almost always, anything less than full normalization is strongly contraindicatedeven in a "direct image" implementation !!! /* big topic in its own right */ Fully normalized design is a "good" representation of the real worldintuitively easy to understand, good base for future growth Everyone knows denormalization makes update harder... but it can make retrieval harder toosee next page Can be bad for performance as well!usually means improving the performance of one application at the expense of others

409 Copyright C. J. Date 2008page 408 S SNO SNAME STATUS CITY S1 Smith 20 London S2 Jones 30 Paris note the S3 Blake 30 Paris change S4 Clark 20 London S5 Adams 30 Athens DENORMALIZATION BAD FOR RETRIEVAL (example) : Again suppose suppliers satisfy { CITY } { STATUS }: Can be regarded as denormalization of SNC and CS /* see earlier */ "Find average city status" (i.e., )

410 Copyright C. J. Date 2008page 409 SELECT DISTINCT AVG (STATUS) AS REQD FROM S result (incorrect): 26 SELECT DISTINCT AVG (DISTINCT STATUS) AS REQD FROM S result (incorrect): 25 SELECT DISTINCT CITY, AVG (STATUS) AS REQD FROM S GROUP BY CITY gives avg status per city, not overall avg SELECT DISTINCT CITY, AVG (AVG (STATUS)) AS REQD FROM S GROUP BY CITY syntax error SELECT DISTINCT AVG (STATUS) AS REQD FROM ( SELECT DISTINCT CITY, STATUS FROM S ) AS POINTLESS correct (at last!)... but is it supported?

411 Copyright C. J. Date 2008page 410 ORTHOGONALITY (a little more science!) : Design theory is about reducing redundancy (true fact!) but whats redundancy ??? Well, certainly: If DB is such that if tuple t appears at all it must appear more than once, then DB clearly involves some redundancy Note that normalization is precisely about eliminating redundant appearances of the same tuple! E.g., suppose once again that suppliers satisfy FD { CITY } { STATUS }

412 Copyright C. J. Date 2008page 411 S SNO SNAME STATUS CITY S1 Smith 20 London S2 Jones 30 Paris note the S3 Blake 30 Paris change S4 Clark 20 London S5 Adams 30 Athens (Sub)tuples and both appear twice (and do represent redundancy)... /* recall that every subset of a tuple is a tuple */ So normalize

413 Copyright C. J. Date 2008page 412 SNC SNO SNAME CITY S1 Smith London S2 JonesParis S3 BlakeParis S4 Clark London S5 Adams Athens CS CITYSTATUS Athens 30 London 20 Paris 30 Now and both appear just once

414 Copyright C. J. Date 2008page 413 BUT WHAT ABOUT : /* part weight < 17.0 pounds */ LPP#PNAMECOLORWEIGHTCITY P1NutRed12.0London P2BoltGreen17.0Paris P3ScrewBlue17.0Oslo P4ScrewRed14.0London P5CamBlue12.0Paris HPP#PNAMECOLORWEIGHTCITY P2Bolt Green 17.0Paris P3Screw Blue 17.0Oslo P6CogRed19.0London /* part weight > 17.0 pounds */

415 Copyright C. J. Date 2008page 414 Normalization doesnt help … but problem is easy to see! l Relvar predicates for LP and HP "overlap" l I.e., they require tuples for parts with weight 17.0 pounds to appear in both relvars: CONSTRAINT LP_AND_HP_OVERLAP ( LP WHERE WEIGHT = 17.0 ) = ( HPWHERE WEIGHT = 17.0 ) ; So:

416 Copyright C. J. Date 2008page 415 THE PRINCIPLE OF ORTHOGONAL DESIGN : First version: No two base relvars should be such that their relvar constraints might require the same tuple to appear in both McGoveran & Date 1994 but somewhat revised here l Solves the LP / HP problem Remember that (as far as the user is concerned) all relvars in the DB are base relvars! Orthogonality principle as stated applies to relvars of the same type … But what about:

417 Copyright C. J. Date 2008page 416 SX SNO SNAME STATUS SY SNO SNAME CITY S1Smith20S1SmithLondon S2Jones10S2JonesParis S3Blake30S3BlakeParis S4Clark20S4ClarkLondon S5Adams 30S5AdamsAthens Second version: Let A and B be distinct relvars. Then there should not exist nonloss decompositions of A and B into projections A1, …, Am and B1, …, Bn, respectively, such that the relvar constraints for some Ai and some Bj might require the same tuple to appear in both. Subsumes first version … But what about:

418 Copyright C. J. Date 2008page 417 SX SNO SNAME STATUS SY ID LABEL CITY S1Smith20S1SmithLondon S2Jones10S2JonesParis S3Blake30S3BlakeParis S4Clark20S4ClarkLondon S5Adams 30S5AdamsAthens Oh, all right...

419 Copyright C. J. Date 2008page 418 THE PRINCIPLE OF ORTHOGONAL DESIGN (final version) : Let A and B be distinct relvars. Replace A and B by nonloss decompositions into projections A1, …, Am and B1, …, Bn, respectively, such that every Ai (i = 1, …, m) and every Bj (j = 1, …, n) is in 6NF. Let some i and j be such that there exists a sequence of zero or more attribute renamings with the property that (a) when applied to Ai, it produces Ak, and (b) Ak and Bj are of the same type. Then there must not exist a constraint to the effect that, at all times, (Ak WHERE ax) = (Bj WHERE bx), where ax and bx are restriction conditions, neither of which is a contradiction. Subsumes second version