Presentation is loading. Please wait.

Presentation is loading. Please wait.

© D. Wong 2002 © D. Wong 2003 1 CS610 / CS710 Database Systems I Daisy Wong.

Similar presentations


Presentation on theme: "© D. Wong 2002 © D. Wong 2003 1 CS610 / CS710 Database Systems I Daisy Wong."— Presentation transcript:

1 © D. Wong 2002 © D. Wong 2003 1 CS610 / CS710 Database Systems I Daisy Wong

2 © D. Wong 2002 © D. Wong 2003 2 Week 1

3 © D. Wong 2002 © D. Wong 2003 3 Objectives  Modeling and design of databases.  Programming: queries and DB operations

4 © D. Wong 2002 © D. Wong 2003 4 Database  A large collection of data: –stored in mass storage –exists over a long period of time –Can take on a variety of appearances depending on the requirements at the time –Can serve as the data source for a variety of applications

5 © D. Wong 2002 © D. Wong 2003 5 Database Examples  Examples: –Wal-Mart : records every item purchased in every store –L. L. Bean: records detail information of each customer and their purchases –Hospitals: record patient demographics, conditions and progress, test results, etc –...  This course, refer to a collection of data managed by a Database Management System (DBMS)

6 © D. Wong 2002 © D. Wong 2003 6 DBMS  A combination of software, data, and structure of the data that support: –Users to create databases and specify their schema –Users to query and modify the data –Storage of very large amount of data, secure the data from accident or unauthorized use, and allow efficient access –Control concurrent access from many users, presenting correct data to each user, and prevent accidental corruption of the data from simultaneous accesses

7 © D. Wong 2002 © D. Wong 2003 7 DBMS lingo  Schema –logical structure of the data. Use Data Definition Language (DDL)  Query –A question about the data. Use Data Manipulation Language (DML)  Transaction (or atomic transaction) –A logical unit of work that must be completed as a whole or not at all.

8 © D. Wong 2002 © D. Wong 2003 8 Three levels of database schemas External View1External View2External View3 Logical Schema Disk Internal schema Logical level Internal level External level Logical to external mappings Internal to Logical mappings

9 © D. Wong 2002 © D. Wong 2003 9 DBMS User Roles  End users –access the database for information to do their jobs. Casual (use simple user interfaces) vs. Sophisticated ( use DML)  Database designers –specify information content (use DDL) to create database systems  Application developers –design and develop applications that extend the functionality of the dbms. E.g. user interface, data analysis and data mining, various business services  Database administrators (DBAs) –administer databases: control access, maintain data accuracy and integrity, monitor and improve database performance

10 © D. Wong 2002 © D. Wong 2003 10 A little history  First attempt – file systems  Hierarchical model (tree based)  Network model (graph base)  Relational model –Proposed by E. F. Codd (1970) –Data should be presented to user as tables (relations) –Queries expressed in a very high-level language –SQL (Structured Query Language) – most important language based on relational model

11 © D. Wong 2002 © D. Wong 2003 11 Relational Model  A conceptual model that represents data as relations.  Relations – tables of data  Query using SQL: SELECT balance FROM Accounts WHERE accountNo = 34567;  Relational DBMS finds an efficient way to answer the query accountNonamebalance 12345Sally1000.21 34567Sue285.48 ……… Accounts

12 © D. Wong 2002 © D. Wong 2003 12 Major components of a DBMS (Simplified. Figure 1.1) DDL commands External level Logical level Internal level Transaction Commands Query Manager DDL compiler Transaction Manager Buffer/File Storage Manager Data Metadata Queries/updates DBA Users / applications

13 © D. Wong 2002 © D. Wong 2003 13 Figure 1.1: Database management system components Ref. FCDS 2ed. by Ullman

14 © D. Wong 2002 © D. Wong 2003 14 Storage Manager Obtains the requested information from data storage Modifies the information if requested and re-store Indexes are used (data structures that help us find data items quickly given a part of their value). Advanced data structures such as B-tree are used for efficient access Indexes are part of the data, their description is part of the metadata Consists of 2 components: file manger and buffer manager 1. 1.File manager keeps track of the location of files on disk and obtains the blocks containing the requested data 2. 2.Buffer manager handles main memory. It manages the memory blocks, obtaining disk blocks from the file manager, trying to optimize the access to data

15 © D. Wong 2002 © D. Wong 2003 15 Query Manager  Parse and optimize the query using a query compiler  Execute the resulting query plan (sequence of actions for the DBMS to perform) –Issues a sequence of requests to storage manager for small pieces of data  Return the result to the requester

16 © D. Wong 2002 © D. Wong 2003 16 Transaction Manager  Assure all transactions are executed properly  ACID properties of “proper” execution: –Atomicity : All of the updates of a transaction are successful, or no update take place –Consistency: Each transaction should leave the database in a consistent state –Isolation: Each transaction, when executed concurrently with other transactions, should have the same effect as if it had been executed by itself –Durability: Once a transaction has completed successfully, its changes to the database should be permanent. Even serious failures should not affect the permanence of a transaction.

17 © D. Wong 2002 © D. Wong 2003 17 Techniques to enforce ACID  Locking – granularity of locks is important.  Logging – write a log to nonvolatile storage. Assure durability.  Transaction Commitment – for durability and atomicity, transactions are computed “tentatively”, recorded, but no changes are made to the db until the transaction gets committed. Changes copied to the log, then copied to db.

18 © D. Wong 2002 © D. Wong 2003 18Trends  Object Oriented DB –Richer data types –Classes and class hierarchy enable share or reuse of sw and schemas –Protect misuse through abstract data types  Object Relational DB  Constraints and triggers handling  Multimedia data  Data Integration to support advance data analysis such as data mining. E.g. data warehouses, data marts  Multi-tier Client-Server architecture, move more processing to the client  Parallel processing  Support Web sites

19 © D. Wong 2002 © D. Wong 2003 19 Knowledge Discovery in Databases (KDD) Knowledge Data Target Data Preprocessed Data Transformed Data Patterns Selection Preprocessing Transformation Data Mining Interpretation / Evaluation Reference: Fayyad; Smyth: "From Data Mining to Knowledge Discovery: An Overview" 1996

20 © D. Wong 2002 © D. Wong 2003 20 Ch. 2 Entity-Relationship Data Model  Data models  Entity-Relationship diagrams  Design Principles  Modeling of constraints  Weak entity sets

21 © D. Wong 2002 © D. Wong 2003 21 Data Modeling  Used for conceptual database design Ideas ODL E/R Relational Schema Relational DBMS Object-oriented DBMS Classes / Objects

22 © D. Wong 2002 © D. Wong 2003 22 Data Models Value-OrientedObject-Oriented Examples –Relational –Logic –Object-oriented –Network –Hierarchical Distinct objects by Data values Object identity Redundancy handling Reduced by Design capabilities Use pointers to objects Many-to-many relationships Handled the same Only binary relationships

23 © D. Wong 2002 © D. Wong 2003 23 Relational Model  Based on mathematically defined relations of entities  Consists of: –Attributes (fields) –Domain legitimate values of attributes (data range) –Views of data presented in table format –Create new views by projections of the database –Records are n-tuples, where n = # of attributes

24 © D. Wong 2002 © D. Wong 2003 24 Entity-Relationship (E/R) Diagrams  Represents the schematic of a database  Useful for designing the conceptual model  Entity set : classes of objects  Entity : member of an entity set (data)  Attributes : properties of the entities in an entity set  Relationship – describe how entities relate to each other

25 © D. Wong 2002 © D. Wong 2003 25 Entity-Relationship Diagrams Symbols attributes relationships entity sets Cardinality: one - one many - one m many - many m n

26 © D. Wong 2002 © D. Wong 2003 26 Entity-Relationship Diagrams (continue)  Entity –Has a set of attributes –The key is underlined  Key –Attributes which uniquely identify an entity in an entity set –An inherent property of the data (e.g. movie title and year) –Serve as a constraint SS#

27 © D. Wong 2002 © D. Wong 2003 27 Week 2

28 © D. Wong 2002 © D. Wong 2003 28 Ch. 2 Entity-Relationship Data Model (continue)  Data models  Entity-Relationship diagrams  Design Principles  Modeling of constraints  Weak entity sets

29 © D. Wong 2002 © D. Wong 2003 29 Entity-Relationship Diagrams Symbols attributes relationships entity sets Cardinality (Multiplicity): one - one many - one m many - many m n

30 © D. Wong 2002 © D. Wong 2003 30 Entity-Relationship Diagrams (continue 1)  Entity –Has a set of attributes –The key is underlined  Key –Attributes which uniquely identify an entity in an entity set –An inherent property of the data (e.g. movie title and year) –Serve as a constraint SS#

31 © D. Wong 2002 © D. Wong 2003 31 Entity-Relationship Diagrams (continue 2)  Multiway relationships  Relationships with roles  Attributes on relationships –E.g. salary of a star for a movie –Not necessary by adding new entity set, whose entities have the attributes

32 © D. Wong 2002 © D. Wong 2003 32 E/R Diagrams Multiway relationships  Can be converted to a collection of binary, many- one relationships –Introduce new entity set (connecting entity set)  Entities are tuples of the relationship set for the multiway relationship –Introduce many-one relationships from the connecting entity set to each of the entity sets that provide components of tuples in the original, multiway relationship

33 © D. Wong 2002 © D. Wong 2003 33 ER Diagrams –Inheritance  Subclasses –special-case entity sets with own special attributes and/or relationships  isa relationship : One-One relationship to handle inheritance, to connect entity set to its subclasses. –E.g. every Cartoon is a Movie  ER symbol for isa : isa Super class Subclass (arrows not drawn)

34 © D. Wong 2002 © D. Wong 2003 34 Design Principles  Faithfulness : Reflects the reality of the data and the problem with the appropriate: –attributes –data ranges –relationships –cardinality E.g. Stars Movies Stars -in

35 © D. Wong 2002 © D. Wong 2003 35 Design Principles (2)  Avoid Redundancy –Redundancy : duplicate representation of data, lead to  Inconsistency, violate data integrity  Waste space E.g. Studios Movies Owns Studio_name

36 © D. Wong 2002 © D. Wong 2003 36 Design Principles (3)  Keep it simple : Avoid more elements than needed  Pick the right kind of element –attributes or entity set –what should the relationships be

37 © D. Wong 2002 © D. Wong 2003 37 DB design  Centers on the entity sets and relationships making up the database.  Users and designers must work together to: –Specify the entity sets and relationships and their meanings –Specify the attributes and the meaning of it –Specify the constraints –Identify key attributes of entity sets –Specify how the data would be viewed –Specify who can view and modify what and when  Better to do a good job in the original design –Need collaboration of users and designers  Reminder: Changing database design later usually very costly.

38 © D. Wong 2002 © D. Wong 2003 38 Constraints  Constraints are part of the schema, not of an instance of the database.  Classification of Constraints –Keys –Single-value constraints –Referential integrity constraints –Domain constraints –General constrains

39 © D. Wong 2002 © D. Wong 2003 39 Keys in the E/R Model  Key for an entity set E is a set K of one or more attributes such that, given any two distinct entities e 1 and e 2 in E, e 1 and e 2 cannot have identical values for each of the attributes in the key K. (Ref. 2.3.2) –E.g. Movie(title, year, length, filmtype), E1 = (titanic, 1960, 120, color), E2 = (titanic, 1997, 180, color) –A key can consist of more than one attribute –There can be more than one possible key (candidate keys), but designate one as the “primary key”. –Attributes that constitute the primary key cannot be null –If the entity set is in an isa-hierarchy, require the root entity set have all the attributes needed for a key. –In E/R diagram, attributes of a key are underlined

40 © D. Wong 2002 © D. Wong 2003 40 Single-Value Constraints  Assertion: at most one value exists in a given context  Ways to express single-value constraints in E/R: –Each attribute of an entity set has a single value  Some attributes may allow null  Key attributes should not be null  No formal representation in E/R diagram, make a notation beside the attribute –Many-one relationship E F R

41 © D. Wong 2002 © D. Wong 2003 41 Referential Integrity  Assertion: exactly one value exists in a given role  Enforced at database implementation –Forbid the deletion of a referenced entity –Require that if referenced entity is deleted, then all entities that reference it are deleted as well.  Use a rounded arrow in the diagram MoviesStudios Owned by 

42 © D. Wong 2002 © D. Wong 2003 42 Other kinds of constraints  Domain constraints : restrict the value of an attribute to be in a limited set. –No formal notation in E/R, place a notation next to the attribute –E.g. declaring the type of an attribute –E.g specify range for the attribute value  General constraints –E.g. degree of the relationships MoviesStars Stars-in <=10

43 © D. Wong 2002 © D. Wong 2003 43 Definition: An entity set whose key is composed of attributes some or all of which belong to another entity set. Causes: 1.entity sets in a hierarchy based on classification 2.Connecting entity sets to eliminate a muliway relationship. Weak Entity Sets Unit-of Crew Studios number name addr

44 © D. Wong 2002 © D. Wong 2003 44 Weak Entity Sets Requirements 1. Key consists of zero or more of its own attributes, and 2. Key attributes from entity sets that are reached by supporting relationships (many-one) 3. R is a supporting relationship for a weak entity set E to some entity set F, R must: –Be a binary, many-one relationship from E to F –R must have referential integrity from E to F –The attributes that F supplies for the key of E must be key attributes of F –If F is weak, then key attributes of F supplied to E may be attributes of some entity set to which F is connected by a supporting relationship (recursive) –For multiple supporting relationship from E to F, each relationship is used to supply a copy of the key attributes of F

45 © D. Wong 2002 © D. Wong 2003 45 Week 3

46 © D. Wong 2002 © D. Wong 2003 46 Ch. 3 (part 1)  Relational Model basics  From E/R diagram to Relations

47 © D. Wong 2002 © D. Wong 2003 47 Relational Model  Based on mathematically defined relations of entities  Consists of: –Attributes (fields) –Domain legitimate values of attributes (data range) –Views of data presented in table format –Create new views by projections of the database –Records are n-tuples, where n = # of attributes (arity)  Use relational algebra operations to compute data  Result of the operations are also relations

48 © D. Wong 2002 © D. Wong 2003 48Relations  Sets of tuples presented in a table (order of tuples immaterial)  Attributes are the column header  Schema of a relation: name and the set of attributes. E.g. Movies (title, year, length, filmType)  Database schema – the set of schemas for the relations in a design  Domain – elementary data type of each attribute  Relation instances titleyearlengthfilmType Star Wars 1977124color Mighty Ducks 1991104color Wayne’s World 199295color Movies

49 © D. Wong 2002 © D. Wong 2003 49 E/R Diagram to Relational Designs 1. Turn each entity set into a relation with the same set of attributes. E.g. Star (name, address) 2. Represent relationships by relations 3. Combining relations when appropriate 4. Handle weak entity sets 5. Convert subclass structures to relations

50 © D. Wong 2002 © D. Wong 2003 50 E/R Relationship to Relations Replace a relationship by a relation (say R): 1.For each entity set involved in relationship R, take its key attribute(s) as part of the schema of the relation for R. 2.If the relationship has attributes, add to R’s attribute set 3.If one entity set has multiple roles in a relationship, then its key attributes appear as many times as there are roles. Need to rename the attributes to avoid name duplication. e.g. Contracts (starName, title, year, studioOfStar, producingStudio) Ref. Fig. 3.6

51 © D. Wong 2002 © D. Wong 2003 51 Discovering Keys for Relations If relation R is constructed from:  Entity set – key attributes of the entity set  Relationship R, key of R is : –many-many : composed of keys of both connected entity sets –many-one from E1 to E2 : consisted of key of E1 –one-one between E1 and E2: either the key of E1 or E2

52 © D. Wong 2002 © D. Wong 2003 52 Combining Relations  In general, if 2 or more relations has exactly the same keys, combine them.  For many-one relationship: –E (ea1, ea2) –F (fa1, fa2, fa3) –R (ea1, fa2, ra1) -- key is from the many relation –Result: E(ea1, ea2, fa2, ra1) E F R

53 © D. Wong 2002 © D. Wong 2003 53 Handling weak entity set 1. The relation for the weak entity set includes it’s own attributes and the attritubes of the other entity sets that help form it’s key  e.g. 1: Contracts (starName, studioName, title, year, salary)  e.g. 2: Crew (number, studioName) 2. The relation for any relationship create as usual (rename attributes to avoid confusion) 3. Don’t need to create a relation for supporting relationships

54 © D. Wong 2002 © D. Wong 2003 54 Weak entity set handling example  Studios(name, addr)  Crews(number, studioName)  Unit-of(number, studioName, name) – but not needed because it’s a supporting relationship Uint-of Crew Studios number name addr

55 © D. Wong 2002 © D. Wong 2003 55 Converting Subclass Structures to Relations Strategies: 1. E/R approach 2. OO approach 3. Use null values

56 © D. Wong 2002 © D. Wong 2003 56 E/R approach  For each entity set E in the hierarchy, create a relation that includes the key attributes from the root and any attributes belonging to E  Do not create a relation for isa relationship  Example: –Movies(title, year, length, filmType) –MurderMysteris(title, year, weapon) –Cartoons(title, year)

57 © D. Wong 2002 © D. Wong 2003 57 OO approach  Treat entities as objects belonging to a single class – for each possible subtree including the root, create one relation, whose schema includes all the attributes of all the entity sets in the subtree  Example: –Movies(title, year, length, filmType) –MoviesC(title, year, length, filmType) –MoviesMM( title, year, length, filmType, weapon) –MoviesCMM(title, year, length, filmType, weapon) –Voices(title, year, starName)  Then combine relations that have the same set of attributes.

58 © D. Wong 2002 © D. Wong 2003 58 Use null values  Create one relation with all the attributes of all the entity sets in the hierarchy. Each entity is represented by one tuple, and that tuple has a null value for whatever attributes the entity does not have.  Example: –Movie(title, year, length, filmType, weapon)

59 © D. Wong 2002 © D. Wong 2003 59 Comparison of the 3 approaches  Pros and Cons in each approach. Factors to consider: –Number of relations involved in a query (depends on the query of interest) –Total number of relations in the schema –Space usage

60 © D. Wong 2002 © D. Wong 2003 60 Operations in the Relational Data Model  Algebraic notation (Relational Algebra)  sets  bags  Constraints on relations  Extensions to the relational model

61 © D. Wong 2002 © D. Wong 2003 61 Ch. 5 Relational Algebra  Relational Algebra expressions are composed of: –Operands - relations represented by:  relation’s name or  the list of tuples in a relation –Operators  Results of the operations are also relations  Build complex expressions from simple ones  Queries – relational algebra expressions

62 © D. Wong 2002 © D. Wong 2003 62 Classes of Relational Algebra Operations Classes of relational algebra operations: 1. Set operations: union, difference, intersection 2. Remove parts of a relation: –Selection – eliminates rows (tuples) –Projection – eliminates columns 3. Combine tuples of two relations : cartesian product and joins 4. Renaming: changes relation schema (i.e. relation name, and/or attribute names)

63 © D. Wong 2002 © D. Wong 2003 63 Example relations 1 ABC abc daf cbd R ABC bga daf S

64 © D. Wong 2002 © D. Wong 2003 64 Set Operations  Union, Difference, Intersection   Given relations R and S: – –R and S have schemas with identical set of attributes, and domain types for each attributes (=> same arity) – –Columns of R and S must be ordered so that the order of attributes is the same for both relations

65 © D. Wong 2002 © D. Wong 2003 65 Union  R  S –Result: a set of tuples that are in R or S or both. –A tuple appears only once in the result even if it is present in both R and S –Reminder: R and S has same arity  Example: ABC abc daf cbd bga

66 © D. Wong 2002 © D. Wong 2003 66 Difference  R – S –Result: set of tuples in R but not in S –Reminder: Not commutative: R – S ≠ S – R –Reminder: R and S has same arity  Example: ABC abc cbd

67 © D. Wong 2002 © D. Wong 2003 67 Intersection  R  S –Result: set of tuples in both R and S, i.e. R-(R-S) –Reminder: R and S has same arity  Example: ABC daf

68 © D. Wong 2002 © D. Wong 2003 68Projection    …, An ( R )  Result: –a new relation that has only the columns for attributes A 1, A 2, … A n of R –Schema of the new relation is {A 1, A 2, … A n }  Example 1:  A,  C ( R )  Example 2:  B ( R ) B b a ACac df cd

69 © D. Wong 2002 © D. Wong 2003 69 Selection   c ( R ), where C is a conditional expression: –operands are constants or attributes of R –Operators are :, , , , and, or, not  Result: –a new relation with a subset of R’s tuples that satisfy condition C –Schema is the same as R’s schema  Example:  B=b (R) ABC abc cbd

70 © D. Wong 2002 © D. Wong 2003 70 Cartesian Product  R x S  Let arity of R = k 1 and arity of S = k 2  Result: the set of all possible (k 1 + k 2 )-tuples whose first k 1 components form a tuple in R, and whose last k 2 components form a tuple in S  Example: R.AR.BR.CS.AS.BS.C abcbga abcdaf dafbga dafdaf cbdbga cbddaf

71 © D. Wong 2002 © D. Wong 2003 71 Example Relation 2 RABCSDE 12331 45662 789

72 © D. Wong 2002 © D. Wong 2003 72 Theta-Joins  R C S where C is an arbitrary condition  Result: tuples in R x S such that C is satisfied, i.e.  C (R x S)  To construct the result: 1.Take the product of R and S 2.Select from the product only those tuples that satisfy the condition C  Arity of result = arity of R + arity of S  Example: R C S where C is B < D

73 © D. Wong 2002 © D. Wong 2003 73 Example Relation 3 ABC abc dbc bbf cad BCD bcd bce adb R S

74 © D. Wong 2002 © D. Wong 2003 74 Natural Joins  R S  Result:  i1, i2, …, im  R.A1=S.A1  …  R.Ak=S.Ak (R x S) where i1, i2, …, im is the list of all components of R x S in order, except the components S.A 1, … S.A k  Example: ABCD abcd abce dbcd dbce cadb

75 © D. Wong 2002 © D. Wong 2003 75 Renaming   (T) or  (T)   S(A1, A2, …, An) (T) or  S (T)  Result: renames relation T to S  Example: R x  (S )  Example: R x  S(X, Y, Z) (S ) ABCXYZ abcbga abcdaf dafbga dafdaf cbdbga cbddaf

76 © D. Wong 2002 © D. Wong 2003 76 Week 4

77 © D. Wong 2002 © D. Wong 2003 77  The rest of Ch. 5 – Relational Algebra  Ch. 10 – Logical query languages

78 © D. Wong 2002 © D. Wong 2003 78 Combining Operations to Form Queries  Construct more complex expressions by applying operators to sub-expressions  Use parentheses to indicate operands grouping  Multiple ways to write equivalent queries  Expression tree for visualizing complex expression  Query optimizer  Example (ref. Fig. 5.8):  (  (Movie)   Movie))  title,year (  length≥100 (Movie)   studioName=‘Fox’ ( Movie))  (  Movie))  title,year (  length≥100 AND studioName=‘Fox’ (Movie)) or

79 © D. Wong 2002 © D. Wong 2003 79 Algebraic Laws  Associative: e.g. R  (S  T) = (R  S)  T  Commutative: e.g. R  S = S  R

80 © D. Wong 2002 © D. Wong 2003 80 Dependent and Independent Operations 1. Set operations: union, difference, intersection 2. Remove parts of a relation: –Selection – eliminates rows (tuples) –Projection – eliminates columns 3. Combine tuples of two relations : cartesian product and joins 4. Renaming: changes relation schema (i.e. relation name, and/or attribute names)

81 © D. Wong 2002 © D. Wong 2003 81 Constraints on Relations Two ways to express constraints in relational algebra: 1. R = Ø// R  Ø 2. R  S // R - S = Ø where R and S are relational algebra expressions

82 © D. Wong 2002 © D. Wong 2003 82 Referential Integrity Constraints  Assertion: a value appearing in one context also appears in another, related context.  Example: Movie(title, year, length, inColor, studioName, producerC#) MovieExec(name, address, cert#, networth) Constraint: the producer of every movie is a certified movie executive, i.e. appear in the MovieExec relation  (Movie)   (MovieExec)  producerC# (Movie)   cert# (MovieExec)  (Movie) -  (MovieExec) = Ø  producerC# (Movie) -  cert# (MovieExec) = Ø or

83 © D. Wong 2002 © D. Wong 2003 83 Other Constraints  Domain constraints example: MovieStars(name, address, gender, birthdate) Constraint: acceptable values for the “gender” attribute are ‘F’ or ‘M’  (MovieStar) = Ø  gender  ’F’ AND gender  ‘M’ (MovieStar) = Ø  Other constraints example: MovieExec(name, address, cert#, networth) Studio(name, address, presC#) Constraint: president of a movie studio must have a net worth of at least $10,000,000  (Studio MovieExec) = Ø  networth<10000000 (Studio presC#=cert# MovieExec) = Ø  Functional dependency constraints

84 © D. Wong 2002 © D. Wong 2003 84 Relational Operations on Bags  Bag –a “set” that is allowed to have more than one occurrence of an element –=> duplicate tuples in a relation –Constraint representations work with bags  Reason: – –For implementation efficiency when duplication is acceptable – tuples is needed for aggregate –When actual no. of tuples is needed for aggregate  Example: AB 12 34 12 12

85 © D. Wong 2002 © D. Wong 2003 85 Relational Operations of Bags (continue) Given: R and S are bags, and tuple t appears in R n times, and in S m times  R  S : contains n + m tuple t  R – S : contains max(0, n-m) tuple t  R  S : contains min(n, m) tuple t   (R) : each tuple is processed independently, resulting duplicate tuples are not eliminated   A,B (R) : each tuple is processed independently, resulting duplicate tuples are not eliminated   (R) : apply the selection condition to each tuple independently, resulting duplicate tuples are not eliminated   c (R) : apply the selection condition to each tuple independently, resulting duplicate tuples are not eliminated

86 © D. Wong 2002 © D. Wong 2003 86 Product and Joins of Bags Given: R and S are bags, and tuple r appears in R m times, and tuple s appears in S n times  R x S : the resulting tuple rs will appear mn times.  R S : each tuple of R is compared to each tuple of S to decide if the pair tuples joins successfully, do not eliminate duplicates  R C S : each tuple of R is compared to each tuple of S to decide if the condition C is met, do not eliminate duplicates

87 © D. Wong 2002 © D. Wong 2003 87 Extended Operation to Relational Algebra  Duplicate elimination : to convert a bag to a set  Aggregation: count, sum, max, min, average  Grouping  Extended Projection  Sorting  Outerjoins

88 © D. Wong 2002 © D. Wong 2003 88 OUTERJOINS  Dangling tuples: tuples that failed to match any tuple of the other relation in the common attributes.  An operator to augment the result of a join by the dangling tuples, padded with null values.  R S : Full outerjoin of R1 and R2 is a join that includes all rows from R1 and R2 matched or not. Unmatched rows are padded with special null symbols.  LEFT outerjoin of R1 and R2 is a join that includes all rows from R1, matched or not, plus the matching values from R2. Unmatched rows are padded with.  RIGHT outerjoin of R1 and R2 is a join that includes all rows from R2, matched or not, plus the matching values from R1. Unmatched rows are padded with.  The joining may be NATURAL or theta join

89 © D. Wong 2002 © D. Wong 2003 89 Outer Join Example ABC abc dbc bbf cad BCD bcd bce adb cdb R S ABCD abcd abce dbcd dbce cadb ABCDabcd abce dbcd dbce bbf cadb cdb Natural join Full Outer Join

90 © D. Wong 2002 © D. Wong 2003 90 Extensions to the Relational Model  Modifications : insert, delete, update  Views : relational expression with a name to be applied real relations to produce the relation defined by the expression. Views can used as arguments to other expressions.  Null values : common interpretations: –Value unknown –Value inapplicable –Value withheld

91 © D. Wong 2002 © D. Wong 2003 91 Ch. 10 Logical Query Languages  Motivation  Datalog  Relational Algebra to Datalog  Recursion in Datalog  Negation in Recursive Rules

92 © D. Wong 2002 © D. Wong 2003 92 Motivation  Logical rules is more natural in representing recursive queries  Logical rules form the basis of many information- integration applications

93 © D. Wong 2002 © D. Wong 2003 93 A Datalog rule example  Relation: Movie(title, year, length, inColor, studioName, producerC#) Movie(title, year, length, inColor, studioName, producerC#) LongMovie(t, y)  Movie(t, y, l, c, s, p) AND l  100 head subgoals body LongMovie =  title, year (  length≥100 (Movie))

94 © D. Wong 2002 © D. Wong 2003 94 Datalog rule   Relational Atoms : predicate followed by arguments   Arithmetic Atoms : comparison between two arithmetic expressions (e.g. x ≠ Y)   Predicate = relation name or arithmetic comparison predicates (e.g. =, <, ≠, etc)   Head – a relational atom   Body – one or more atoms (subgoals) connected by AND   Subgoals (not head) may be optionally negated by NOT   Local variables – variables in body, not in head

95 © D. Wong 2002 © D. Wong 2003 95 Datalog  A logic based data model  The underlying mathematical model of data is essentially that of the relational model  Predicate symbols denote relations  Relational algebra operations are described by rules   Query : a collection of one or more rules.  The relation in the rule head is the answer to the query

96 © D. Wong 2002 © D. Wong 2003 96 Extensional and Intensional Predicates  Extensional Predicates (EDB) The set of relations which ARE defined as part of the actual database (i.e. physically stored). e.g. R = {1}  Intensional Predicates (IDB) The set of relations which are NOT defined as part of the actual database but are instead abstracted from logical rules. e.g. P (x)  Q (x) Q (x)  R (x) Q (x)  R (x)  A predicate must be IDB or EDB but not both. –IDB predicate can appear in the body or head of a rule –EDB predicate can appear in the rule body only

97 © D. Wong 2002 © D. Wong 2003 97 3 Different Interpretations of Logical Rules  Proof-Theoretic Interpretaton  Model-Theoretic Interpretation  Computational Interpretation The following discussions will use the EDB: R = {1}

98 © D. Wong 2002 © D. Wong 2003 98 Proof-Theoretic Interpretation As axioms to be used in a proof. –From the facts in the database, see what other facts can be proved using the rules in all possible ways. –All facts derivable using the rules are derivable by applying the rules in the forward direction only –Example: P = {1}, Q = {1}

99 © D. Wong 2002 © D. Wong 2003 99 Model-Theoretic Interpretation As definition of possible worlds or models. –To be a model, an interpretation must make the rules true, no matter what assignment of values is made for the variables in each rule. –Multiple models are possible –With no negations, a unique minimal model exists that gives the same result as the proof-theoretic interpretation. –Minimal model : cannot make any true fact false and still have a model consistent with the EDB –Example: 1. P = {1, 2, 3}, Q = {1, 2} 2. P = {1}, Q={1}

100 © D. Wong 2002 © D. Wong 2003 100 Computational Interpretation By providing an algorithm for “executing” the rules to determine whether a potential fact is true or false. –E.g. Prolog – uses a particular algorithm that involves searching for proofs of the potential fact. –Drawback:  the set of facts Prolog finds a proof is not necessarily the same as the set of all facts for which a proof exists  The set of facts Prolog finds true is not necessarily a model.

101 © D. Wong 2002 © D. Wong 2003 101 Week 5

102 © D. Wong 2002 © D. Wong 2003 102 Example 1. sibling(X, Y)  parent(X, Z) AND parent(Y, Z) AND X ≠ Y. 2. cousin(X, Y)  parent (X, Xp) AND parent (Y, Yp) AND sibling(Xp, Yp). 3. cousin(X, Y)  parent (X, Xp) AND parent (Y, Yp) AND cousin((Xp, Yp). 4. related(X, Y)  sibling(X, Y). 5. related(X, Y)  related(X, Z) AND parent(Y, Z). 6. related(X, Y)  related(Z, Y) AND parent(X, Z).

103 © D. Wong 2002 © D. Wong 2003 103 Meaning of Logical Rules  Head is true of its arguments if there exist values for local variables that make all the subgoals true  If rule contains only non-negated relational predicates, then value of head variables is the result of natural join the subgoals and project onto the head variables  Example: Cousin(X, Y) =  (P(X, Xp) P(Y, Yp) Cousin(X, Y) =  X,Y (P(X, Xp) P(Y, Yp) S(Xp, Yp))

104 © D. Wong 2002 © D. Wong 2003 104 Evaluation of Rules – Variable based  Consider all possible assignments of values to the variables. If all subgoals are true, add the head to the result relation.  Example: –Only assignments that make the first subgoal true: Case 1: x = 1, z = 2 Case 2: x = 2, z = 3 –In case 1, y = 3 makes second subgoal true. Since (1, 3) is not in R, so the third subgoal is also true.  So, add (x, y) = (1, 3) to S –In case 2, no value of y makes the second subgoal true –So, S = {(1, 3)}

105 © D. Wong 2002 © D. Wong 2003 105 Evaluation of Rules – Tuple based  Consider all assignments of tuples to subgoals that make each subgoal true. If the variables are assigned consistent values, add the head to the result.  Start with non-negated relational subgoals only  Example: –Start with R(x,z) and R(z, y). Possible tuple assignments to subgoals: –Only 1 assignment gives consistent value to z –It also makes the third subgoal true –So, S = {(1, 3)} R(x, z) R(z, y) (1, 2) (2, 3) (1, 2) (2, 3)

106 © D. Wong 2002 © D. Wong 2003 106 Safe rules  A rule is safe if all of its variables are limited.  Variable, X, in a rule is limited if: 1. Any variable that appears as an argument in a relational predicate of the body is limited 2. Any variable that appears in a subgoal X=a, or a=X, where a is a constant, is limited 3.Variable x is limited if it appears in a subgoal X=Y or Y=X, when Y is a variable already known to be limited  Rules must be safe to make sense (i.e. to produce finite relation as the result)

107 © D. Wong 2002 © D. Wong 2003 107 Safe and unsafe rule Examples 1. biggerThan(X, Y)  X > Y. 2. S(X)  R ( Y ). 3. S(X)  R(Y) AND X < Y 4. P(X, Y)  Q(X, Z) & W=a & Y=W. unsafe safe unsafe unsafe

108 © D. Wong 2002 © D. Wong 2003 108 Datalog Rules and Relational Algebra summary  Intersection : AND  Union: multiple rules with same IDB predicate as rule head  Difference: NOT  Projection: rule head atom has arguments of the projected attributes  Selection: condition represented by arithmetic subgoals  Product: AND two subgoals, with head atom contain all attributes of the two subgoals  Joins

109 © D. Wong 2002 © D. Wong 2003 109 A Datalog rule example  Relation: parent(C,P), where P is a parent of C sibling(X, Y)  parent(X, Z) AND parent(Y, Z) AND X ≠ Y. head subgoals body S(X,Y) =  X,Y (  X≠Y (P(X, Z) P(Y, Z)))

110 © D. Wong 2002 © D. Wong 2003 110 Recursion  P  Q P depends on Q  Draw a graph: node = IDB predicates, arc P -> Q means P depends on Q  Cycles means recursion  Example: cousin(X, Y)  parent (X, Xp) AND parent (Y, Yp) AND cousin((Xp, Yp).

111 © D. Wong 2002 © D. Wong 2003 111 Dependency Graph sibling parent related cousin

112 © D. Wong 2002 © D. Wong 2003 112 Least Fixedpoint  The smallest IDB relations that contain all tuples that the rules require us to follow  Iterative Fixed-point Evaluation of recursive rules: Start IDB =  Apply rules to IDB, EDB Changes to IDB? No Done yes

113 © D. Wong 2002 © D. Wong 2003 113 Fixedpoint Computation Example (Ref. Text) 1. FollowOn(x, y)  SequelOf(x, y) 2. FollowOn(x, y)  SequelOf(x, z) AND FollowOn(z, y) Evaluation note: At round i, the only new tuples added to any IDB relation comes from match to tuples added in the round i-1

114 © D. Wong 2002 © D. Wong 2003 114 Negation in Recursive Rules  We can compute a minimal fixedpoint only for stratifiable recursion with negation.  To determine stratifiability: 1.Draw a graph whose nodes are the IDB predicates 2.Draw an arc (labeled -) from node A to node B for rule A  not B 3.Draw an arc (no label) from node A to node B for rule A  B

115 © D. Wong 2002 © D. Wong 2003 115 Negation in Recursive Rules (continued)  If graph has a cycle containing one or more negative arcs, the the recursion is NOT stratified. (When negation appears in the least fixedpoint operation.)  Stratum of IDB predicate A = the largest no. of negative arcs on a path beginning from A.  If no negative arcs on any paths from predicate A, stratum number of A = 0  Evaluate IDB predicates in the order of their strata, starting with the lowest.

116 © D. Wong 2002 © D. Wong 2003 116 Week 6

117 © D. Wong 2002 © D. Wong 2003 117 Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency (normalization) –Boyce-Codd Normal Form (BCNF) –Third Normal Form –Recovering information from a decomposition  Multivalued Dependencies and decomposition –Fourth Normal Form  Relationships Among Normal Forms

118 © D. Wong 2002 © D. Wong 2003 118 Database Design Problems  Principal kinds of anomalies that constitute bad db design: –Redundancy –Update Anomalies –Deletion Anomalies –Insertion Anomalies  Want to find a good relation schema design for the relational model

119 © D. Wong 2002 © D. Wong 2003 119 Redundancy  Information repeated unnecessarily in several tuples.  Example: –length and filmType for Movie

120 © D. Wong 2002 © D. Wong 2003 120 Update Anomalies  Change information in one tuple but leave the same information unchanged in another  A consequence of redundancy  Cause potential inconsistency  Example: –Change length of Star Wars in one tuple but not the others

121 © D. Wong 2002 © D. Wong 2003 121 Deletion Anomalies  A set of values becomes empty (deleted), may lose other information as side effect  Example: –Deleting the only star listed for the movie Mighty Ducks

122 © D. Wong 2002 © D. Wong 2003 122 Insertion Anomalies  Cannot insert a tuple because some of the data not yet available  Inverse to deletion anomalies  Problem of using null value to fill the missing / unavailable data: –When the data becomes available, will we remember to delete the one with nulls –If the missing data is part of a key, then can’t use null  Example: –Cannot start to keep track of the information of a new movie when the cast is not yet determined

123 © D. Wong 2002 © D. Wong 2003 123  Dependency is an assertion that only a subset of all possible relations are ‘legal’. An assertion about the real world, cannot be proved.  It’s a form of constraints.  Dependencies in a relation means some sort of redundancy in the legal relations.  Example: title, year  length  Functional dependencies (FD)  Multivalued dependencies (MVD) Dependencies titleyearlengthfilmTypestudioNamestarName Star wars 1977124ColorFoxCarrieFisher 1977???ColorFox Mark Hamil

124 © D. Wong 2002 © D. Wong 2003 124 Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally determines Y e.g. A 1 A 2 …A n  B 1 B 2 …B m  X  Y is an assertion about R that whenever two tuples agree on all the attributes of X, then they must also agree on attributes of Y.  It’s a constraint on the data that may appear within a relation. (i.e. schema level control of data)  It’s a restriction on relations that depend only on the equality or inequality of values, i.e. value-oblivious  It’s the most important type of value-oblivious constraints.  Value-oblivious constraints have the greatest impact for designing database schemas

125 © D. Wong 2002 © D. Wong 2003 125 Significance of Functional Dependencies  Key: 1.Entity set R (A1, …, An), and X is a subset of A1, …, An that form a key for R, then may assert: X  Y where Y is any subset of {A1, …, An} 2.R(A1, …, An) is a many-one relationship from entity set E1 to entity set E2, and among the Ai’s are attributes that form a key X for E1 and a key Y for E2, then may assert: X  Y  Tool for explaining the process of normalization

126 © D. Wong 2002 © D. Wong 2003 126 Representing FD using Relational Algebra  Example 5.31 MovieStar (name, address, gender, birthdate) FD: name  address  (  (MovieStar) x  (MovieStar)) =   MS1.name=MS2.name AND MS1.address  MS2.address (  MS1 (MovieStar) x  MS2 (MovieStar)) = 

127 © D. Wong 2002 © D. Wong 2003 127 Keys in the E/R Model  key for an entity set E is a set K of one or more attributes such that, given any two distinct entities e 1 and e 2 in E, e 1 and e 2 cannot have identical values for each of the attributes in the key K. –A key can consist of more than one attribute –There can be more than one possible key (candidate keys), but designate one as the “primary key”. –Attributes that constitute the primary key cannot be null –If the entity set is in an isa-hierarchy, require the root entity set have all the attributes needed for a key. –In E/R diagram, attributes of a key are underlined

128 © D. Wong 2002 © D. Wong 2003 128 Keys of Relations K is a key for relation R if: 1.K  all attributes of R 2.For no proper subset of K is (1) true (i.e. a key must be minimal) 3.If K at least satisfies (1), then K is a superkey Example: {title, year, starName) forms a key for the Movie relation Superkeys: A set of attributes that contains a key is called a superkey

129 © D. Wong 2002 © D. Wong 2003 129 Discovering Keys for relations Consider relation R: 1. If R comes from an entity set, then the key for the relation is the key attributes of this entity set 2. If R is from a many-many relationship, then the keys of both connected entity sets are the key attributes for R 3. If R is from a many-one relationship from entity set E1 to entity set E2, then the keys attributes of E1 are the key attributes for R 4. If R is from a one-one relationship, then the key attributes for either of the connected entity sets are key attributes of R. More than 1 candidate key in this case.

130 © D. Wong 2002 © D. Wong 2003 130 FD Rules  Splitting / Combining rule  Trivial Dependencies  Armstrong’s Axioms: 1.Reflexivity 2.Augmentatioin 3.Transitivity  Computing closure of attributes  Finding all implied FD’s ( Ref. section: 3.5.7)

131 © D. Wong 2002 © D. Wong 2003 131 Splitting / Combining rule  FD: A 1 A 2 …A n  B 1 B 2 …B m vs. A 1 A 2 …A n  B 1... A 1 A 2 …A n  B m  Splitting rule: replace FD I by the set of FDs II  Combining rule: replace the set of FDs II by FD I I II

132 © D. Wong 2002 © D. Wong 2003 132 Trivial Dependencies  FD A 1 A 2 …A n  B is trivial if B is one of the A’s  Every trivial dependency holds in every relation  For FD A 1 A 2 …A n  B 1 B 2 …B m –Trivial if the B’s are a subset of the A’s e.g. title year  title –Nontrivial if at least one of the B’s is not among the A’s –Completely nontrivial if none of the B’s is also one of the A’s  Trivial-dependency rule: –A 1 A 2 …A n  B 1 B 2 …B m  A 1 A 2 …A n  C 1 C 2 …C k where the C’s are all those B’s that are not also A’s

133 © D. Wong 2002 © D. Wong 2003 133 Armstrong’s Axioms: A set of inference rules: (Pg. 135, 2 nd ed. Pg.99) 1. Reflexivity If {B 1, B 2, …, B m }  {A 1, A 2, …, A n }, then A 1 A 2 …A n  B 1 B 2 …B m. // trivial dependencies 2. Augmentatioin If A 1 A 2 …A n  B 1 B 2 …B m, then A 1 A 2 …A n C 1 …C k  B 1 B 2 …B m C 1 …C k for any set of attributes C 1 …C k 3. Transitivity If A 1 A 2 …A n  B 1 B 2 …B m and B 1 B 2 …B m  C 1 …C k then A 1 A 2 …A n  C 1 …C k

134 © D. Wong 2002 © D. Wong 2003 134 Week 7

135 © D. Wong 2002 © D. Wong 2003 135 Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally determines Y e.g. A 1 A 2 …A n  B 1 B 2 …B m  Significance: –Key: 1.Entity set R (A1, …, An), and X is a subset of A1, …, An that form a key for R, then may assert: X  Y where Y is any subset of {A1, …, An} 2.R(A1, …, An) is a many-one relationship from entity set E1 to entity set E2, and among the Ai’s are attributes that form a key X for E1 and a key Y for E2, then may assert: X  Y

136 © D. Wong 2002 © D. Wong 2003 136 Keys of Relations K is a key for relation R if: 1.K  all attributes of R 2.For no proper subset of K is (1) true (i.e. a key must be minimal) 3.If K at least satisfies (1), then K is a superkey Example: {title, year, starName) forms a key for the Movie relation Superkeys: A set of attributes that contains a key is called a superkey

137 © D. Wong 2002 © D. Wong 2003 137 FD Rules  Splitting / Combining rule  Trivial Dependencies  Armstrong’s Axioms: 1.Reflexivity 2.Augmentatioin 3.Transitivity  Computing closure of attributes  Finding all implied FD’s ( Ref. section: 3.5.7)

138 © D. Wong 2002 © D. Wong 2003 138 Given vs. Derived FD’s  Given FD’s: stated initially for a relation. FD’s These are known to hold for a relation R.  Derived FD’s are FD’s that follow logically from the given FD’s (using the inference rules) or by the closure algorithm of attributes.  Basis : any set of given FD’s from which all FD’s for a relation can be inferred.  Minimal basis: if no proper subset of the FD’s in a basis can also derive the complete set of FD’s.  Example: 3.22 – pp. 98

139 © D. Wong 2002 © D. Wong 2003 139 Closure of attributes  Let {A 1, A 2, …,A n } is a set of attributes, S is a set of FD’s  Closure of {A 1, A 2, …,A n } under the dependencies in S = set of attributes B such that every relation that satisfies all the dependencies in set S also satisfies A 1 A 2 …A n  B. I.e. B is the set of attributes functionally determined by {A 1, A 2, …,A n }  Notation: {A 1, A 2, …,A n } +  Allow trivial dependencies, so A 1 A 2 …A n are always in {A 1, A 2, …,A n } +

140 © D. Wong 2002 © D. Wong 2003 140 Closure Algorithm Let X be {A 1, A 2, …,A n } + 1.X = {A 1, A 2, …,A n }// initialize X 2.Search for some FD B 1 B 2 …B m  C where B 1 B 2 …B m  X, but C is not in X. If found, add C to X 3.Repeat step 2 until no more attributes can be added to X. 4.The resulting X is the correct value of {A 1, A 2, …,A n } +

141 © D. Wong 2002 © D. Wong 2003 141 Attributes Closure Example. 3.19 pp. 93 Given: R(A, B, C, D, E, F) and FD’s AB  C BC  AD D  E CF  B To find: {A, B} + Procedure: 1.Let X = {A, B} 2.Add C to X // AB  C,  X = {A, B, C} 3.Add D to X// BC  AD,  X = {A, B, C, D} 4.Add E to X// D  E,  X = {A, B, C, D, E} 5.End, {A, B} + = {A, B, C, D, E}

142 © D. Wong 2002 © D. Wong 2003 142 To infer if a FD follows a set of FD in a relation  AB  D follows from dependencies in example 3.19? –Compute {A, B} +, if D ends up in the closure, then AB  D follows from the dependencies –Since {A, B} + = {A, B, C, D, E},  AB  D follows.  D  A follows from dependencies in example 3.19? –Compute {D} +, if A ends up in the closure, then D  A follows from the dependencies –But {D} + = {D, E},  D  A does NOT follows.

143 © D. Wong 2002 © D. Wong 2003 143 Attributes Closures and Keys To test if A 1,A 2,…,A n is a key for a relation R:  Compute {A 1, A 2, …,A n } + 1.Check if {A 1, A 2, …,A n } + is the set of all attributes of R 2.If yes, then check no subset of A 1,A 2,…,A n, say S, such that S + is the set of all attributes of R  If A 1,A 2,…,A n only satisfies 1 but not 2, then A 1,A 2,…,A n is a superkey

144 © D. Wong 2002 © D. Wong 2003 144 Finding all implied FD’s  Motivation: Given relation R with FDs F. When projecting R to form new relation S, want to know what FD’s hold in S  Method: compute all FD’s that 1.Follow from F 2.Involve only attributes of S  Problem: no. of FDs may be large (could be exponential in the number of attributes of S)  So, make simplifications in computing attribute closure: –Ignore empty set and set of all attributes (trivial FD’s) –Drop XY  A if X  A holds. –If closure of some set X is all attributes, then ignore supersets of X.

145 © D. Wong 2002 © D. Wong 2003 145 Algorithm to find all implied FD’s 1. For each set of attributes X compute X +. Start with closures of singleton set, and then move onto doubleton if necessary. 2. Add X  A for each A in X + - X 3. Ignore “obvious” dependencies that follow from others as described in the simplification guidelines.

146 © D. Wong 2002 © D. Wong 2003 146 Finding all implied FD’s Example  Given: R(ABCD), and FD’s F: AB  C, C  D, D  A  What FD’s follow from F? A + = A; B + = B // nothingA + = A; B + = B // nothing C + = ACD // add C  AC + = ACD // add C  A D + = DA// nothing newD + = DA// nothing new (AB) + = ABCD// add AB  D; skip all supersets of AB(AB) + = ABCD// add AB  D; skip all supersets of AB (BC) + = ABCD// nothing new; skip all supersets of BC(BC) + = ABCD// nothing new; skip all supersets of BC (BD) + = ABCD// add BD  C; skip all supersets of BD(BD) + = ABCD// add BD  C; skip all supersets of BD (AC) + = ACD ; (AD) + = (AD); (CD) + = ACD // nothing new(AC) + = ACD ; (AD) + = (AD); (CD) + = ACD // nothing new (ACD)+ = ACD// nothing new;(ACD)+ = ACD// nothing new; All other sets contain AB, BC, or BD, so skipAll other sets contain AB, BC, or BD, so skip  the only interesting FD’s that follow from F are :  the only interesting FD’s that follow from F are : C  A, AB  D, BD  C

147 © D. Wong 2002 © D. Wong 2003 147 Normalization  Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies.  Normal forms: –Boyce-Codd Normal Form (BCNF) –Third Normal Form (3NF) –Fourth Normal Form (4NF)

148 © D. Wong 2002 © D. Wong 2003 148 Boyce-Codd Normal Form (BCNF)  Definition: A relation R is in BCNF iff whenever there is a nontrivial dependency A 1 A 2 …A n  B holds for R, {A 1, A 2, …,A n } must be a superkey for R  Alternative definition after applying the combining rule to all FD’s with a common L.S.: A relation R is in BCNF iff whenever nontrivial dependency A 1 A 2 …A n  B 1 B 2 …B m holds for R, {A 1, A 2, …,A n } must be a superkey for R  Guarantees no redundancy, prevents update, deletion, and insertion anomalies  E.g. title year  length filmType studioName // violates BCNF because key of Movie is {title, year, starName}, so Movie is not in BCNF

149 © D. Wong 2002 © D. Wong 2003 149 Decomposition to BCNF Given: relation R, FD’s F. Decomposition strategy: 1.look for a non-trivial FD A 1 A 2 …A n  B 1 B 2 …B m that violates BCNF. (Heuristic: add to the RS as many attributes as are functionally determined by A 1 A 2 …A n ) 2.Compute {A 1, A 2, …,A n } + // all the attributes involved in the violating dependency 3.Decompose R into 1. 1. {A 1, A 2, …,A n } +, and 2. 2. (R - {A 1, A 2, …,A n } + )  {A 1, A 2, …,A n } // LS + all attributes not involved in the violating dependency Others A’s B’s

150 © D. Wong 2002 © D. Wong 2003 150 BCNF Decomposition Example 3.24 pp 104  Relation: Movie(title, year, length, filmType, studioName, starName)  Key: {title, year, starName}  FD’s: title year  length filmType studioName is a BCNF violation, so Movie not in BCNF  Decomposition: Schema 1: {title, year, length, filmType, studioName} Schema 2: {title, year, starName}  To obtain the new relations, project the schemas onto Movie  To recover information (I.e. Movie) from the new relations: natural join the new relations. Does not lose information.

151 © D. Wong 2002 © D. Wong 2003 151 Additional points about BCNF  A relation may have more than 1 keys  Need some key in the LHS of any nontrivial FD  Only nontrivial FD’s are potential BCNF violation candidates  Any 2-attributes relations is in BCNF  Decomposition must be based on FD that holds in the relation, otherwise can’t recover the original relation from the new relations.  Eliminates all redundancies  Some decomposition may not preserve the FD’s (e.g. 3.32 pp 114)

152 © D. Wong 2002 © D. Wong 2003 152 Third Normal Form (3NF)  Definition: A relation R is in 3NF if: whenever there is a nontrivial dependency A 1 A 2 …A n  B holds for R, either {A 1, A 2, …,A n } is a superkey for R, or B is a member of some key (i.e. B is prime).  BCNF with a relaxed condition  Some redundancy might be left in 3NF if resulting relations are not in BCNF  Preserves all the FD’s  Process similar to BCNF with the added condition  To recover information (I.e. Movie) from the new relations: natural join the new relations. Does not lose information.

153 © D. Wong 2002 © D. Wong 2003 153 Week 9

154 © D. Wong 2002 © D. Wong 2003 154 Normalization  Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies.  Decompose relation schema into Normal forms: –Boyce-Codd Normal Form (BCNF) –Third Normal Form (3NF) –Fourth Normal Form (4NF)  To obtain the new relations, project the schemas onto the original relation schema (e.g. Movie)  To recover information (I.e. Movie) from the new relations: natural join the new relations.

155 © D. Wong 2002 © D. Wong 2003 155 BCNF Decomposition Example 3.24 pp 104  Relation: Movie(title, year, length, filmType, studioName, starName)  Key: {title, year, starName}  FD’s: title year  length filmType studioName is a BCNF violation, so Movie not in BCNF  Decomposition: Schema 1: {title, year, length, filmType, studioName} Schema 2: {title, year, starName}  To obtain the new relations, project the schemas onto Movie  To recover information (I.e. Movie) from the new relations: natural join the new relations. Does not lose information.

156 © D. Wong 2002 © D. Wong 2003 156 Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally determines Y e.g. A 1 A 2 …A n  B 1 B 2 …B m  A 1 A 2 …A n  BB…B is an assertion about R that two attributes or sets of attributes in R are dependent of one another.  A 1 A 2 …A n  B 1 B 2 …B m is an assertion about R that two attributes or sets of attributes in R are dependent of one another.

157 © D. Wong 2002 © D. Wong 2003 157 Mutivalued Dependencies (MVD)  Given: relation schema R, and A 1 A 2 …A n and BB…B be subsets of attributes of R.  Given: relation schema R, and A 1 A 2 …A n and B 1 B 2 …B m be subsets of attributes of R. MVD : A 1 A 2 …A n  BB…B holds in R if : MVD : A 1 A 2 …A n  B 1 B 2 …B m holds in R if : For each pair of tuples t and u of relation R that agree on all the A’s, we can find in R some tuple v that agrees: 1.With both t and u on the A’s, 2.With t on the B’s, and 3.With u on all attributes of R that are not among the A’s or B’s  A 1 A 2 …A n  BB…B is an assertion about R that two attributes or sets of attributes in R are independent of one another.  A 1 A 2 …A n  B 1 B 2 …B m is an assertion about R that two attributes or sets of attributes in R are independent of one another.  Cause redundancy not related to FD’s in a BCNF schema.  Most common source: putting 2 or more many-many relationships in a single relation.

158 © D. Wong 2002 © D. Wong 2003 158 MVD Rules  Trivial dependencies rule If A 1 A 2 …A n  BB…BA 1 A 2 …A n  CC…C If A 1 A 2 …A n  B 1 B 2 …B m holds for R, then A 1 A 2 …A n  C 1 C 2 …C k holds where the C’s are the B’s + one or more of the A’s. The converse also hold.   Transitive rule A 1 A 2 …A n  BB…BB 1 B 2 …B m  CC…C A 1 A 2 …A n  CC…C If A 1 A 2 …A n  B 1 B 2 …B m and B 1 B 2 …B m  C 1 C 2 …C k then A 1 A 2 …A n  C 1 C 2 …C k   Splitting rule does not hold  street city, but not name  street E.g. name  street city, but not name  street So, always start with set of attributes on the R.S. because splitting rule does not hold.

159 © D. Wong 2002 © D. Wong 2003 159 More MVD Rules  Every FD is an MVD Because If FD A 1 A 2 …A n  BB…B, then swapping B’s between tuples that agree on A’s doesn’t create new tuples. Because If FD A 1 A 2 …A n  B 1 B 2 …B m, then swapping B’s between tuples that agree on A’s doesn’t create new tuples.  Complementation rule If X  Y, then X  Z, where Z is all attributes not in X or Y e.g. Star_Star_In {name, street, city, title, year} name  street city name  street city name  title year name  title year A’s B’s t u

160 © D. Wong 2002 © D. Wong 2003 160 Nontrivial MVD A 1 A 2 …A n  BB…B A 1 A 2 …A n  B 1 B 2 …B m for a relation R is nontrivial if: 1. BB…BA 1 A 2 …A n 1. B 1 B 2 …B m is not a subset of A 1 A 2 …A n 2. A 1 A 2 …A n  BB…B 2. A 1 A 2 …A n  B 1 B 2 …B m is not all attributes of R

161 © D. Wong 2002 © D. Wong 2003 161 Fourth Normal Form (4NF)  Decompose relations that has MVD’s into 4NF to eliminate MVD’s.  Definition: R is in 4NF if A 1 A 2 …A n  BB…B A 1 A 2 …A n } is a superkey. R is in 4NF if A 1 A 2 …A n  B 1 B 2 …B m is a nontrivial MVD, {A 1 A 2 …A n } is a superkey.  every FD is an MVD, so 4NF is more stringent than BCNF  Since every FD is an MVD, so 4NF is more stringent than BCNF   Only nontrivial MVD’s has the potential to violate 4NF

162 © D. Wong 2002 © D. Wong 2003 162 4NF Decomposition Given: relation R, and nontrivial MVD X  Y that violate 4NF 1. Decompose X  Y into XY and X  (R-Y) 2. Produce the relations by projecting R onto XY and X  (R-Y) 3. Reconstruct R from the new relations using natural join e.g. Star_Star_In {name, street, city, title, year} and name  street city Decompose Star_Star_In using name  street city into {name, street, city} and {name, title, year} X Y R

163 © D. Wong 2002 © D. Wong 2003 163 Relationships among normal forms  4NF is the most stringent  4NF  BCNF  3NF

164 © D. Wong 2002 © D. Wong 2003 164 Lossless-join decomposition Given: Relation R, decomposed into schemes R 1, R 2, … R k, and D is a set of dependencies. Definition: R 1, R 2, … R k is a lossless-join (w.r.t. D) if for every relation r for R satisfying D: r =  R1 (r)  R2 (r)  Rk (r) r =  R1 (r)  R2 (r) …  Rk (r) i.e. Every relation r for R is the natural join of its projections onto the R i ’s. The lossless-join property is necessary if the decomposed relation is to be recoverable from its decomposition. However, joins are expensive. So, don’t over decompose!

165 © D. Wong 2002 © D. Wong 2003 165 Structured Query Language (SQL)  A DDL and DML for relational DBMSs  History: ANSI SQL,, SQL-92 (SQL2), SQL-99 (SQL3)  SQL-99 extends SQL2 with object-relational features and other new features  Most DBMS vendors implements the core, and then add bells and whistles and variations  Query capability is close to relational algebra, with lots of extensions.  Case insensitive except characters inside quoted strings ' ' e.g. 'Smith'  'SMITH'  ; as statement delimiter

166 © D. Wong 2002 © D. Wong 2003 166 Example database schema Movie(title, year, length, inColor, studioName, producerC#) StartIn(movieTitle, movieYear, starName) MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#)

167 © D. Wong 2002 © D. Wong 2003 167 SQL Quries – basic form SELECT attribute/s FROM relations / views /subqury WHERE conditional expression;

168 © D. Wong 2002 © D. Wong 2003 168 SQL query examples 1. Example 1: SELECT * FROM Movie; -- * => all attributes of Movie 2. Example 2: SELECT * FROM Movie WHERE studioName = 'Disney' AND year = 1990; 3. Example 3: SELECT title, length FROM Movie WHERE studioName = 'Disney' AND year = 1990;

169 © D. Wong 2002 © D. Wong 2003 169 Duplicates  SQL generally operates using bags instead of sets  Exception: UNION, INTERSECT, EXCEPT operation  To eliminate duplicates, add keyword DISTINCT to the SELECT clause e.g. SELECT DISTINCT starName FROM StarsIn; FROM StarsIn;  Duplicate elimination is costly. Use judiciously.

170 © D. Wong 2002 © D. Wong 2003 170 SQL Correspondence to Relational Algebra SELECT L --  R.A. project FROM R--  R.A. operands WHERE C;--  R.A. select R.A. expression:  L (  C (R)) R.A. expression:  L (  C (R)) When reading and writing queries: 1. FROM -- what relations are involved 2. WHERE-- what's the tuples selection criteria 3. SELECT-- what columns to output

171 © D. Wong 2002 © D. Wong 2003 171 Union, Intersection, Difference of Queries  UNION : R1 UNION R2 or (Q1) UNION (Q2) e.g. (SELECT title, year FROM Movie) UNION (SELECT movieTitle AS title, movieYear AS year FROM StarsIn); (SELECT movieTitle AS title, movieYear AS year FROM StarsIn);  INTERSECT : R1 INTERSECT R2 or (Q1) INTERSECT (Q2) (Q1) INTERSECT (Q2)  EXCEPT: R1 EXCEPT R2-- difference (Q1) EXCEPT (Q2) (Q1) EXCEPT (Q2)

172 © D. Wong 2002 © D. Wong 2003 172 Union, Intersection, Difference of Queries (continued)  Q1 and Q2 are queries that produce relations  R1 and R2, or results of Q1 and Q2 should have the same list of attributes and attribute types. Rename if necessary.  Duplicates are eliminated automatically  Add the keyword ALL after UNION, INTERSECT, or EXCEPT to prevent duplicates elimination

173 © D. Wong 2002 © D. Wong 2003 173 SQL and Relational Algebra  The six independent operations are implemented by SQL  SQL is relational complete

174 © D. Wong 2002 © D. Wong 2003 174 Some data values in SQL 1. Strings 2. Dates and Times 3. Null values 4. Truth value of Unknown

175 © D. Wong 2002 © D. Wong 2003 175 1. Strings  Comparison operators (according to lexicographical order), = =  LIKE -- pattern matching  % -- matches any sequence of 0 or more characters  _ -- matches any one character  E.g.: title LIKE 'Star _ _ _ _'  E.g.: title LIKE '%''s%'  Can specify escape character  E.g. title LIKE 'x%x%' ESCAPE 'x'

176 © D. Wong 2002 © D. Wong 2003 176 2. Dates and Times  Date constant: DATE '2002-10-01'  Time constant: TIME '15:00:02.5'  Timestamp (combines dates and times): TIMESTAMP '2002-10-01 15:00:02.5‘ (beware of implementation differences!)  Comparison operators apply

177 © D. Wong 2002 © D. Wong 2003 177 3. Null Values  NULL to represent: 1.Value unknown 2.Value inapplicable 3.Value withheld  Operations involving NULL 1.Arithmetic operation: result is NULL 2.Comparison: result is UNKNOWN  NULL is not a constant, therefore NULL cannot be used explicitly as an operand.  IS NULL and IS NOT NULL checks  Read "Pitfalls Regarding Nulls" pp. 250

178 © D. Wong 2002 © D. Wong 2003 178 4. UNKNOWN  Consider TRUE = 1, FALSE = 0, UNKNOWN = 0.5 1.AND of 2 truth-value = min. of the 2 values 2.OR of 2 truth-value = max. of the 2 values 3.Negation of v = 1-v  Refer Fig. 6.2 pp. 250 for truth table for 3-valued logic

179 © D. Wong 2002 © D. Wong 2003 179 The Six Clauses in SQL Queries 1. SELECT-- required 2. FROM-- required 3. WHERE 4. GROUP BY 5. HAVING-- if used, must follows a group by clause 6. ORDER BY  Subqueries may appear in the FROM clause and the WHERE clause  Comments begins with ‘--’

180 © D. Wong 2002 © D. Wong 2003 180 Table level SQL (ref. 6.6, pp. 292)  Create table – to define the schema of a base table (Ref. 6.6.1 for data types syntax) E.g. create table EMP ( empno int not null, empno int not null, lastName varchar(30) not null, lastName varchar(30) not null, firstName varchar(30) not null, firstName varchar(30) not null, num_of_children int, num_of_children int, constraint pk_EMP primary key (empno) constraint pk_EMP primary key (empno));  Drop table – to destroy a base table e.g. drop table EMP;

181 © D. Wong 2002 © D. Wong 2003 181 Tuple Modification Statements (ref. 6.5, pp. 286)  Insert – to add a row Syntax: insert into R(A 1..A n ) values (v 1 …v n ) –E.g. insert into emp( empno, lastName, firstName, num_of_children) values (12345, ‘Doe’, ‘John’, 1) –Or insert into emp values (12345, ‘Doe’, ‘John’, 1)  Delete – to remove a row Syntax: delete from R where Syntax: delete from R where –E.g. delete from emp where empno = 12345  Update – to modify the contents of a row Syntax: update R set A i = value where A j = targetValue –E.g. update emp set num_of_children = 2 where empno = 12345

182 © D. Wong 2002 © D. Wong 2003 182 Some JOINS in SQL. (ref. pp. 270)  CROSS JOIN--  R.A. cartesian product e.g. Movie CROSS JOIN StarsIn;  JOIN … ON--  R.A. theta-join e.g. Movie JOIN StarsIn ON title = movieTitle AND year = movieYear;  [NATURAL] JOIN--  R.A. natural join e.g. MovieStar NATURAL JOIN MovieExec; or MovieStar JOIN MovieExec; MovieStar JOIN MovieExec;  OUTERJOINS-- joins that include dangling tuples

183 © D. Wong 2002 © D. Wong 2003 183 OUTERJOINS  An operator to augment the result of a join by the dangling tuples, padded with null values.  Full outerjoin of R1 and R2 is a join that includes all rows from R1 and R2 matched or not. Unmatched rows are padded with NULLs.  LEFT outerjoin of R1 and R2 is a join that includes all rows from R1, matched or not, plus the matching values from R2. Unmatched rows are padded with NULLs.  RIGHT outerjoin of R1 and R2 is a join that includes all rows from R2, matched or not, plus the matching values from R1. Unmatched rows are padded with NULLs.  The joining may be NATURAL or theta join

184 © D. Wong 2002 © D. Wong 2003 184 Outerjoins Syntax 1. R1 NATURAL {FULL | LEFT | RIGHT} OUTER JOIN R2; E.g. 1. MovieStar NATURAL FULL OUTER JOIN MovieExec; E.g. 2. MovieStar NATURAL LEFT OUTER JOIN MovieExec; E.g. 3. MovieStar NATURAL RIGHT OUTER JOIN MovieExec;

185 © D. Wong 2002 © D. Wong 2003 185 Outerjoins Syntax (continued) 1. R1 {FULL | LEFT | RIGHT} OUTER JOIN R2 ON conditional expression; E.g. 1. Movie FULL OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 2. MovieStar LEFT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 3. MovieStar RIGHT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear;

186 © D. Wong 2002 © D. Wong 2003 186 Use result of joins as subqueries in queries  E.g. SELECT title, year, length, inColor, studioName, producerC#, starName FROM Movie JOIN StarsIn ON title = movieTitle AND year = movieYear;

187 © D. Wong 2002 © D. Wong 2003 187 Week 10

188 © D. Wong 2002 © D. Wong 2003 188 SQL (continued)  SQL continued  Oracle Sequence  Views

189 © D. Wong 2002 © D. Wong 2003 189 SQL Correspondence to Relational Algebra SELECT L --  R.A. project FROM R--  R.A. operands WHERE C;--  R.A. select R.A. expression:  L (  C (R)) R.A. expression:  L (  C (R)) When reading and writing queries: 1. FROM -- what relations are involved 2. WHERE-- what's the tuples selection criteria 3. SELECT-- what columns to output

190 © D. Wong 2002 © D. Wong 2003 190 SQL and Relational Algebra  The six independent operations are implemented by SQL  SQL is relational complete

191 © D. Wong 2002 © D. Wong 2003 191 Some data values in SQL 1. Strings 2. Dates and Times 3. Null values 4. Truth value of Unknown

192 © D. Wong 2002 © D. Wong 2003 192 4. UNKNOWN  Consider TRUE = 1, FALSE = 0, UNKNOWN = 0.5 1.AND of 2 truth-value = min. of the 2 values 2.OR of 2 truth-value = max. of the 2 values 3.Negation of v = 1-v  Refer Fig. 6.2 pp. 250 for truth table for 3-valued logic

193 © D. Wong 2002 © D. Wong 2003 193 SQL data types (pp. 292, 6.6.1)  Character strings:  char(n)-- fixed length  varchar(n)-- varying length  Integer  int or integer  Floating-point numbers  float, real  dec (m, n), or decimal(m, n) -- floating point number with fixed decimal places, where m > n. e.g. dec(7, 2)  Dates and times  date, time, timestamp, datetime  Refer to handout for SQL more data types description.

194 © D. Wong 2002 © D. Wong 2003 194 SQL syntax: the Six Clauses in SQL Queries 1. SELECT-- required 2. FROM-- required 3. WHERE 4. GROUP BY 5. HAVING-- if used, must follows a group by clause, to choose the groups based on some aggregate properties of the group itself 6. ORDER BY-- to sort the output  E.g. Select name, sum(length) from MovieExec, Movie Where producerC# = cert# group by name having min(year) < 1930 Order by name desc

195 © D. Wong 2002 © D. Wong 2003 195 SQL syntax (continued)  Subqueries may appear in the FROM clause and the WHERE clause  Comments begins with ‘--’  Tuple variables : an alias to a Relation, defined in the from clause –E.g. Select Star1.name, Star2.name From MovieStar Star1, MovieStar Star2 Where Star1.address = Star2.address and Star1.name < Star2.name

196 © D. Wong 2002 © D. Wong 2003 196 Interpreting Multi-relation Queries in SQL  To define the meaning of select-from-where MovieStar(name, addr, gender, bDate) SELECT Star1.name, Star2.name FROM MovieStar Star1, MovieStar Star2 WHERE Star1.addr = Star2.addr AND Star1.name < Star2.name; nameaddrgenderbDate J. Anniston Addr ABC F8/88/88 H. Ford Addr LMN M7/7/77 B. Pitt Addr ABC M8/1/88

197 © D. Wong 2002 © D. Wong 2003 197 3 Equivalent ways of interpretataion 1. Nested Loops 2. Parallel assignment –No explicitly created nested loops –Consider all possible assignments of tuples from the relations to the tuple variables –Each assignment that produce a true value from the WHERE clause contribute a tuple to the answer –Construct the resulting tuples using the attributes of the SELECT clause –Similar to the tuple based Datalog rule evaluation, the WHERE clause serves the purpose of the arithmetic subgoals

198 © D. Wong 2002 © D. Wong 2003 198 Evaluation of Rules – Tuple based  Consider all assignments of tuples to subgoals that make each subgoal true. If the variables are assigned consistent values, add the head to the result.  Start with non-negated relational subgoals only  Example: –Start with R(x,z) and R(z, y). Possible tuple assignments to subgoals: –Only 1 assignment gives consistent value to z –It also makes the third subgoal true –So, S = {(1, 3)} R(x, z) R(z, y) (1, 2) (2, 3) (1, 2) (2, 3)

199 © D. Wong 2002 © D. Wong 2003 199 3. Conversion to R.A  FROM : Cartesian product (rename if necessary)  WHERE : as the selection criteria  SELECT : specify the projection operation attributes

200 © D. Wong 2002 © D. Wong 2003 200 Unintuitive Result of SQL Interpretation Given: unary relations R, S, T To compute: R  (S  T) Consider query: SELECT R.A FROM R, S, T WHERE R.A = S.A OR R.A = T.A; Consider the case if T is empty.

201 © D. Wong 2002 © D. Wong 2003 201 Some JOINS in SQL. (ref. pp. 270)  CROSS JOIN--  R.A. cartesian product e.g. Movie CROSS JOIN StarsIn;  JOIN … ON--  R.A. theta-join e.g. Movie JOIN StarsIn ON title = movieTitle AND year = movieYear;  [NATURAL] JOIN--  R.A. natural join e.g. MovieStar NATURAL JOIN MovieExec; or MovieStar JOIN MovieExec; MovieStar JOIN MovieExec;  OUTERJOINS-- joins that include dangling tuples

202 © D. Wong 2002 © D. Wong 2003 202 OUTERJOINS  An operator to augment the result of a join by the dangling tuples, padded with null values.  Full outerjoin of R1 and R2 is a join that includes all rows from R1 and R2 matched or not. Unmatched rows are padded with NULLs.  LEFT outerjoin of R1 and R2 is a join that includes all rows from R1, matched or not, plus the matching values from R2. Unmatched rows are padded with NULLs.  RIGHT outerjoin of R1 and R2 is a join that includes all rows from R2, matched or not, plus the matching values from R1. Unmatched rows are padded with NULLs.  The joining may be NATURAL or theta join

203 © D. Wong 2002 © D. Wong 2003 203 Outerjoins Syntax 1. R1 [NATURAL] {FULL | LEFT | RIGHT} OUTER JOIN R2; E.g. 1. MovieStar NATURAL FULL OUTER JOIN MovieExec; E.g. 2. MovieStar NATURAL LEFT OUTER JOIN MovieExec; E.g. 3. MovieStar NATURAL RIGHT OUTER JOIN MovieExec;

204 © D. Wong 2002 © D. Wong 2003 204 Outerjoins Syntax (continued) 2. R1 {FULL | LEFT | RIGHT} OUTER JOIN R2 ON conditional expression; E.g. 1. Movie FULL OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 2. MovieStar LEFT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 3. MovieStar RIGHT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear;

205 © D. Wong 2002 © D. Wong 2003 205 Use result of joins as subqueries in queries  E.g. SELECT title, year, length, inColor, studioName, producerC#, starName FROM Movie JOIN StarsIn ON title = movieTitle AND year = movieYear;

206 © D. Wong 2002 © D. Wong 2003 206 Conditions involving Relations 1. EXIST R – true iff R is not empty 2. S IN R – true iff S = one of the values in R S NOT IN R 3. S > ALL R – true iff S > every value in R (> can be the other comparison operators). Operator NOT applies. 4. S > ANY R – true iff S > at least 1 value in R (> can be the other comparison operators). Operator NOT applies. Case 1: S is scalar, R is unary Case 2: S is a tuple, R is a multi-attribute relation (arity S must = arity R) When compare tuples, the first value has highest priority, and subsequent value the next highest. E.g. (1, 7, 8) < (2, 0, 1) Ref: 6.3-6.4.

207 © D. Wong 2002 © D. Wong 2003 207 Table level SQL (ref. 6.6, pp. 292)  Create table – to define the schema of a base table (Ref. 6.6.1 for data types syntax) E.g. create table EMP ( empno int not null, empno int not null, lastName varchar(30) not null, lastName varchar(30) not null, firstName varchar(30) not null, firstName varchar(30) not null, num_of_children int, num_of_children int, constraint pk_EMP primary key (empno) constraint pk_EMP primary key (empno));  Drop table – to destroy a base table e.g. drop table EMP;

208 © D. Wong 2002 © D. Wong 2003 208 Tuple Modification Statements (ref. 6.5, pp. 286)  Insert – to add a row Syntax: insert into R(A 1..A n ) values (v 1 …v n ) –E.g. insert into emp( empno, lastName, firstName, num_of_children) values (12345, ‘Doe’, ‘John’, 1) –Or insert into emp values (12345, ‘Doe’, ‘John’, 1) Read: P. 288 – Timing of Insertion.  Delete – to remove a row Syntax: delete from R where Syntax: delete from R where –E.g. delete from emp where empno = 12345  Update – to modify the contents of a row Syntax: update R set A i = value where A j = targetValue –E.g. update emp set num_of_children = 2 where empno = 12345

209 © D. Wong 2002 © D. Wong 2003 209 Modifying Relation Schema  DROP TABLE – eliminate all data  Alternative is to ALTER TABLE: 1.To add a column ALTER TABLE tablename ADD columnName typespec; 2.To delete a column ALTER TABLE tablename DROP columnName; 3.To change an attribute size ALTER TABLE tablename MODIFY (columnName typespec);

210 © D. Wong 2002 © D. Wong 2003 210 Default Value  Used when a value is not supplied for a tuple when create or modify. System assume null when none specified except for NOT NULL attributes.  Defined with data declarations: e.g. Phone CHAR(16) DEFAULT ‘unlisted’

211 © D. Wong 2002 © D. Wong 2003 211 Views  Relations that are defined by an expression like a query.  Views can be queried, and can even be modified in some cases  Declaring views in SQL: CREATE VIEW AS ;  Use view to present what the user is allowed to see  A view exists only within the environment of a session.

212 © D. Wong 2002 © D. Wong 2003 212 View Example  Given: base table Movie(title, year, length, inColor, studioName, presidentC#) CREATE VIEW ParamountMovie AS SELECT title, year FROM Movie WHERE StudioName = 'Paramount';

213 © D. Wong 2002 © D. Wong 2003 213 SQL queries to View 1. Has the same syntax as a stored relation E.g. Select title From ParamountMovie Where year = 1979; 2. When query to view, the tuples are obtained from the base table. 3. Equivalent to the following query without view: Select title From Movie Where studioName = 'Paramount' and ear = 1979; Ref. Pp.308, 6.7.5 (Interpreting queries involving views)

214 © D. Wong 2002 © D. Wong 2003 214 Update views  A view update is translated to modify the base table.  Only updatable views can be modified.  Views that can be updated: –Simple –Created by SELECT, but not SELECT DISTINCT, some attributes from R : 1. The WHERE clause must not involve R in a subquery 2. The select clause must include enough attributes that for every tuple inserted into the view, the other attributes can be filled out with NULL of default value, and have a tuple of the base relation that will yield the inserted tuple of the view.

215 © D. Wong 2002 © D. Wong 2003 215 Update view problem e.g.  E.g. INSERT into ParamountMovie VALUES ('Star Trek', 1979); Does not meet criteria 2 because: The base relation Movie would have NULL rather than 'Paramount' as StudioName

216 © D. Wong 2002 © D. Wong 2003 216 Other operations on Views  DELETE tuple from a view = delete from base table, with the condition defining the view added to the condition of the delete statement  UPDATE view = update the base table, with the condition defining the view added to the condition of the update statement  DELETE view: does not affect the base table DROP VIEW ;

217 © D. Wong 2002 © D. Wong 2003 217 Oracle's Sequences  Sequential lists of numbers that are generated automatically by the database.  Useful for creating unique surrogate key values for primary key attribute when no desirable attribute exist to use as the primary key  Data type of attributes using sequences is NUMBER

218 © D. Wong 2002 © D. Wong 2003 218 Oracle's Sequences (continued 1)  To create a sequence (SIMPLE FORM): CREATE SEQUENCE [INCREMENT BY ] [START WITH ]; e.g. CREATE SEQUENCE ClassSequence INCREMENT BY 1 START WITH 1;  To access the next sequence value:.NEXTVAL.NEXTVAL  To use.NEXTVAL in an insert statement: INSERT INTO VALUES (.NEXTVAL, … ); e.g INSERT INTO class (classkey, classid, description, section, location) VALUES (ClassSequence.NEXTVAL, ?, ?, ?, ?);

219 © D. Wong 2002 © D. Wong 2003 219 Oracle's Sequences (continued 2)  To access the current sequence value:.CURRVAL.CURRVAL  may use in select, insert statements  May be used immediately after using the NEXTVAL in the same database session.  To view property of a sequence using the "user_sequences" system table: SELECT * FROM user_sequences;  To delete a sequence: DROP SEQUENCE ;

220 © D. Wong 2002 © D. Wong 2003 220 Week 11

221 © D. Wong 2002 © D. Wong 2003 221  Indexes  JDBC  JDBC in J2EE (Java 2 Enterprise Edition)

222 © D. Wong 2002 © D. Wong 2003 222 Indexes  An index on an attribute A is a data structure to improve query performance efficiency  Reason: not efficient to scan all tuples (for large relations) in order to find the few that meet a given condition  E.g. SELECT * FROM Movie WHERE studioName = ‘Disney’ AND year = 1990;  Not part of SQL standard

223 © D. Wong 2002 © D. Wong 2003 223 Indexes Typical Syntax  To create index CREATE INDEX indexName ON R(A1,…An) E.g. 1.CREATE INDEX YearIndex ON Movie(year); 2.CREATE INDEX KeyIndex ON Movie(title, year);  Delete Index DELETE INDEX yearIndex;

224 © D. Wong 2002 © D. Wong 2003 224 Selection of Indexes  Index design requires an estimate of the typical mix of queries and other operations on the db  Example of good use of indexes: 1.An attribute frequently compared to constant in a where clause of a query 2.Attribute that appear frequently in join operations e.g. SELECT name FROM Movie, MovieExec WHERE title = ‘status’ AND producerC# = cert#; WHERE title = ‘status’ AND producerC# = cert#;

225 © D. Wong 2002 © D. Wong 2003 225 Decision factors  Important to strike a balance.  Factors: 1.Given attribute A, and index on A will: – Greatly speed up queries with a condition on that attribute – May speed up joins involving A 2.Index make insertion, deletion, and updates more complex and time-consuming

226 © D. Wong 2002 © D. Wong 2003 226 Indexes (continued)  Techniques to execute SQL queries are intimately associated with storage structures. Typically, a relation is stored in many disk blocks.  An index is an auxiliary structure, perhaps stored in a separate file, that support fast access to the rows of a table.  Main cost of a query or modification is I/O: No. of disk blocks to be read into memory and write onto disk

227 © D. Wong 2002 © D. Wong 2003 227 Index Selection Example  Given Relation: StarsIn(movieTitle, movieYear, starName)  3 operations to perform: 1.Q1: select movieTitle, movieYear From StarsIn where starName = s; 2.Q2: select starName from StarsIn where movieTitle = t and movieYear = y; 3.I: insert into StarsIn values(t, y, s); where s, t, y are some constants

228 © D. Wong 2002 © D. Wong 2003 228 Example's assumptions 1. Cost for examining 1 disk block = 1 unit 2. StarsIn is stored in 10 disk blocks 3. Average no. of stars in a movie = 3 4. Average no. of movies that a star appeared in = 3 5. Tuples for a given star or movie are likely to be spread over the 10 disk blocks of StarsIn 6. One disk access is needed to read a block of the index every time when the index is used to locate tuples with a given value for the indexd attribute(s)

229 © D. Wong 2002 © D. Wong 2003 229 Example's Estimated Cost of actions Action No Index Star Index Movie Index Both Indexes Q1104104 Q2101044 I2446 Cost of 3 actions 2+8p 1 +8p 2 4+6p 2 4+6p 1 6-2p 1 -2p 2 Star index is an index on StarName, Movie index is an index on MovieTitle and movieYear. The numbers in rows 2-5 of the table are no. of disk accesses for the action. Costs associated with the three actions, as a function of which indexes are selected (Ref. Fig. 6.17 2 nd ed.)

230 © D. Wong 2002 © D. Wong 2003 230 Example's usage scenarios  The fraction of the time to do Q1 = p1, Q2 = p2, I=1-p1-p2  Consider: –Case 1: p1 = p2 = 0.1 –Case 2: p1 = p2 = 0.4 –Case 3: p1=0.5, p2=0.1 What is the best index strategy for each case?  Create only the index that helps the most frequently used query type (e.g. query about stars => create Star index)

231 © D. Wong 2002 © D. Wong 2003 231 JDBC (Java DataBase Connectivity)  An API to the database driver manager  Provides call-level interface for the execution of SQL statements from a Java language program  Developed by SUN Microsystems  An integral part of the Java language  Java applications use the JDBC dialect of SQL, independent of the DBMS used  Support applications to request information about the schema from the DBMS at run time.

232 © D. Wong 2002 © D. Wong 2003 232 JDBC Interaction with SQL DB 1. Make connection to the database 2. Create SQL statement 3. Execute the SQL statement 4. Result table from the SQL "select" statement is returned as a Java object. JDBC provide methods to access the rows and columns. 5. SQL statements return simple integer results that represent the number of affected rows (e.g. insert) OR

233 © D. Wong 2002 © D. Wong 2003 233 Connecting to a db through JDBC Application Driver 1 Driver 2 Driver 3 DBMS server Driver Manager JDBC Modules

234 © D. Wong 2002 © D. Wong 2003 234 JDBC - java.sql package  Defines a collection of interfaces and classes that allow programs to interact with db.  Interfaces for primary SQL execution: –Driver: supports data connection creation –Connection: represents connection between a java client and and SQL database server –Statement: includes methods for executing text queries –PreparedStatement: represents a precompiled and stored query –CallableStatement: used to execute SQL stored procedures –ResultlSet: contains the results of a query –ResultSetMetaData: information about a ResultSet, including the attribute names and types

235 © D. Wong 2002 © D. Wong 2003 235 Client-server Architectures User Interface / Application Middleware Application Database Server User and Application tier Middle tier Database Server tier

236 © D. Wong 2002 © D. Wong 2003 236 JDBC in J2EE   J2EE – Java 2 Enterprise Edition, a middle layer server   Connection to DBMS using JDBC (e.g. Cloudscape, Oracle, MS SQL)   J2EE Platform – services and architecture   Enterprise JavaBeans (EJB) – –Session Beans vs. Entity Beans   EJB access to databases using JDBC – –Database connection – –Persistence management (Entity Bean e.g.) – –Transaction management (Session Bean e.g.)

237 © D. Wong 2002 © D. Wong 2003 237 J2EE Services   HTTP - enables Web browsers to access servlets and JavaServer Pages TM (JSP) files   EJB - allows clients to invoke methods on enterprise beans   Authentication - enforces security by requiring users to log in   Naming and Directory - allows programs to locate services and components through the Java Naming and Directory Interface TM (JNDI) API

238 © D. Wong 2002 © D. Wong 2003 238 J2EE Architecture Ref. Java TM 2 Enterprise Edition Developer's Guide, Figure 1-2

239 © D. Wong 2002 © D. Wong 2003 239 Enterprise JavaBeans (EJB)   Server-side Java components   Contain the business logic of enterprise application   Support database access   Transactional   Multi-user secure   Managed by the EJB container   Prohibited from a set of operations

240 © D. Wong 2002 © D. Wong 2003 240 Session Bean vs. Entity Bean Session Bean Entity Bean Purpose Performs a task for a client Represents a business entity object that exists in persistent storage. Shared Access May have one client. May be shared by multiple clients. Persistence Not persistent. Persistent. Entity state remains in a database. Ref. Java TM 2 Enterprise Edition Developer's Guide, Table 1-1

241 © D. Wong 2002 © D. Wong 2003 241 EJB Access to Databases Using JDBC API   J2EE uses 1. 1.JDBC 2.0 (java.sql) and 2. 2.JDBC 2.0 Optional package (javax.sql)   To make a connection to database in J2EE : 1. 1.Should not hardcode the actual name (URL) of the database in EJB 2. 2.Should refer to the database with a logical name 3. 3.Use a JNDI lookup when obtaining the database connection.

242 © D. Wong 2002 © D. Wong 2003 242 Driver and Data source properties In J2EE configuration file, resource.properties, specify:   Driver e.g. 1 Cloudscape that is packaged with Sun’s J2EE jdbcDriver.0.name=COM.cloudscape.core.RmiJdbcDriver e.g. 2 Oracle jdbcDriver.0.name= oracle.jdbc.driver.OracleDriver   JDBC URL e.g. 1 Cloudscape jdbcDataSource.0.name=jdbc/Cloudscape jdbcDataSource.0.url=jdbc:cloudscape:rmi:CloudscapeDB;create=true e.g. 2 Oracle jdbcDataSource.0.name=jdbc/Oracle jdbcDataSource.0.url= jdbc:oracle:thin:@bigoh.cis.uab.edu:1521:cs610

243 © D. Wong 2002 © D. Wong 2003 243 Making a connection to database example 1. Specify the logical database name. private String dbName = "java:comp/env/jdbc/AccountDB"; 2. Obtain the DataSource associated with the logical name. InitialContext ic = new InitialContext(); DataSource ds = (DataSource) ic.lookup(dbName); 3. Get the Connection from the DataSource. Connection con = ds.getConnection( username, password );

244 © D. Wong 2002 © D. Wong 2003 244 Specifying JNDI name for deployment Step 1: Enter the code name

245 © D. Wong 2002 © D. Wong 2003 245 Step 2: Map the coded name to the JNDI name

246 © D. Wong 2002 © D. Wong 2003 246 Persistence Management Container-Managed Persistence Entity bean code does not contain database access calls. The EJB container generates the SQL statements. Bean-Managed Persistence Entity bean code contains the database access calls (SQLs) (i.e. you write the code!)

247 © D. Wong 2002 © D. Wong 2003 247 Container Managed example: Product entity bean ProductEJB.java ProductHome.java Product.java ProductClient.java Bean Managed example: Account entity bean AccountEJB.java AccountHome.java Account.java AccountClient.java

248 © D. Wong 2002 © D. Wong 2003 248

249 © D. Wong 2002 © D. Wong 2003 249

250 © D. Wong 2002 © D. Wong 2003 250

251 © D. Wong 2002 © D. Wong 2003 251 Week 12

252 © D. Wong 2002 © D. Wong 2003 252 EJB Access to Databases Using JDBC API   J2EE uses 1. 1.JDBC 2.0 (java.sql) and 2. 2.JDBC 2.0 Optional package (javax.sql)   To make a connection to database in J2EE : 1. 1.Should not hardcode the actual name (URL) of the database in EJB 2. 2.Should refer to the database with a logical name 3. 3.Use a JNDI lookup when obtaining the database connection.

253 © D. Wong 2002 © D. Wong 2003 253 Driver and Data source properties In J2EE configuration file, resource.properties, specify:   Driver e.g. 1 Cloudscape that is packaged with Sun’s J2EE jdbcDriver.0.name=COM.cloudscape.core.RmiJdbcDriver e.g. 2 Oracle jdbcDriver.0.name= oracle.jdbc.driver.OracleDriver   JDBC URL e.g. 1 Cloudscape jdbcDataSource.0.name=jdbc/Cloudscape jdbcDataSource.0.url=jdbc:cloudscape:rmi:CloudscapeDB;create=true e.g. 2 Oracle jdbcDataSource.0.name=jdbc/Oracle jdbcDataSource.0.url= jdbc:oracle:thin:@bigoh.cis.uab.edu:1521:cs610

254 © D. Wong 2002 © D. Wong 2003 254 Making a connection to database example 1. Specify the logical database name. private String dbName = "java:comp/env/jdbc/AccountDB"; 2. Obtain the DataSource associated with the logical name. InitialContext ic = new InitialContext(); DataSource ds = (DataSource) ic.lookup(dbName); 3. Get the Connection from the DataSource. Connection con = ds.getConnection( username, password );

255 © D. Wong 2002 © D. Wong 2003 255 Specifying JNDI name for deployment Step 1: Enter the code name

256 © D. Wong 2002 © D. Wong 2003 256 Step 2: Map the coded name to the JNDI name

257 © D. Wong 2002 © D. Wong 2003 257 Persistence Management Container-Managed Persistence Entity bean code does not contain database access calls. The EJB container generates the SQL statements. Bean-Managed Persistence Entity bean code contains the database access calls (SQLs) (i.e. you write the code!)

258 © D. Wong 2002 © D. Wong 2003 258 Container Managed example: Product entity bean ProductEJB.java ProductHome.java Product.java ProductClient.java Bean Managed example: Account entity bean AccountEJB.java AccountHome.java Account.java AccountClient.java

259 © D. Wong 2002 © D. Wong 2003 259

260 © D. Wong 2002 © D. Wong 2003 260

261 © D. Wong 2002 © D. Wong 2003 261

262 © D. Wong 2002 © D. Wong 2003 262 Transaction Management Ref. JavaTM 2 Enterprise Edition Developer's Guide

263 © D. Wong 2002 © D. Wong 2003 263 Container-Managed Transactions Description Code does not include statements that begin and end the transaction Immediately before an EJB method starts - transaction begins Just before the method exits - commits Each method can be associated with a single transaction Prohibited methods, e.g.: commit, setAutoCommit, and rollback methods of java.sql.Connection Limitation: When a method is executing, it can be associated with either a single transaction or no transaction at all

264 © D. Wong 2002 © D. Wong 2003 264 Bean Managed Transaction Session bean code invokes methods that mark the boundaries of the transaction - setAutoCommit(); commit(); rollback(); An entity bean may not have bean-managed transactions public void ship (String productId, String orderId, int quantity) { try { con.setAutoCommit(false); updateOrderItem(productId, orderId); updateInventory(productId, quantity); con.commit(); } catch (Exception ex) { try { con.rollback(); throw new EJBException("Transaction failed: " + ex.getMessage()); } catch (SQLException sqx) { throw new EJBException("Rollback failed: " + sqx.getMessage()); } } } Ref. Java TM 2 Enterprise Edition Developer's Guide, JDBC Transaction Example:

265 © D. Wong 2002 © D. Wong 2003 265 J2EE Resources Java TM 2 SDK, Enterprise Edition Technical Documentation  Java TM 2 Enterprise Edition Developer's Guide http://java.sun.com/j2ee/j2sdkee/ http://archives.java.sun.com/archives/j2ee-interest.html Developing Enterprise Applications with the Java TM 2 Platform Enterprise Edition http://java.sun.com/j2ee/blueprints/ Major commercial implementations: WebLogic - Bea Websphere - IBM

266 © D. Wong 2002 © D. Wong 2003 266.Net Data Providers  Purpose: –connect, read, and execute commands against data sources –other functions, such as the management of input and output parameters, security, transactions, and database server errors.

267 © D. Wong 2002 © D. Wong 2003 267.NET data provider classes 1. SqlConnection 2. SqlCommand 3. SqlDataReader - can use for read-only applications from a SQL Server data source 4. SqlDataAdapter - acts as a bridge between a remote SQL Server data source and a DataSet class instance inside a Visual Basic.NET solution

268 © D. Wong 2002 © D. Wong 2003 268 Semi-structured Data Model (Ref. 4.6)  Blends class and relation for 1.Flexibility, therefore suitable for integration of database. 2.Serve as a document model in notation such as XML 3.Schemaless – the data is “self-describing”, the schema is attached to the data itself, which can vary arbitrarily over time and within a single db.

269 © D. Wong 2002 © D. Wong 2003 269 Semi-structured Data Model Utility  An information integration tool to integrate legacy databases  Legacy database problem: The problem of integrating multiple databases designed independently, and used over time for many different applications.

270 © D. Wong 2002 © D. Wong 2003 270 Semi-structured Data Representation  A db of semi-structured data is a collection of nodes  Node types: –Leaf – have associated data of atomic types (e.g. numbers, strings) –Interior – have 1 or more arcs out –Root – an interior node has no arcs entering it, representing the entire db (I.e. the entry point)  Every node must be reachable from the root.

271 © D. Wong 2002 © D. Wong 2003 271 Semistructured Data Representation (continued)  Arc – has a label to indicate relationship between 2 nodes. Consider arc L connecting nodes M, N, the 2 roles that an arc serves: 1.If M is an object, N is an attribute, then L represents the name of the attribute 2.If M and N are both objects, L is the name of the relationship from N to M  The structure is a graph, not necessarily a tree.  E.g. Figure 4.19

272 © D. Wong 2002 © D. Wong 2003 272 XML  XML : eXensible Markup Language.  Tag-based notation for “marking” documents  Plain text  Formats of tag: 1. string of data 1. string of data e.g. 123-456-7890 e.g. 123-456-7890 2. 2. e.g. e.g.  Two modes: –Well-formed XML –Valid XML  Comment format:  Comment format:

273 © D. Wong 2002 © D. Wong 2003 273 HTML vs. XML  HTML tags For describing the presentation of the data (e.g. abc ) by a web browser  XML tags For describing the meaning of the data (e.g. 123-456-7890 )

274 © D. Wong 2002 © D. Wong 2003 274 Well-formed XML Requirements:  Document begin with a declaration (prologue):  It has a root.  Every opening tag is followed by a matching closing tag, and the elements are properly nested inside each other. E.g. … E.g. …  Any attribute can occur at most once in a given opening tag, its value must be provided and quoted.  E.g Figure 4.21

275 © D. Wong 2002 © D. Wong 2003 275 Well-formed XML (continued)  Invent your own tags.  Resembles semi-structure data model: –Schemaless –Self-describing –Object like

276 © D. Wong 2002 © D. Wong 2003 276 Valid XML  Uses a Document Type Definition (DTD) to specify: –Allowable tags –Grammar for how the tags may be nested.  More flexible than a strict-schema model –E.g. allow optional or missing fields  More restrictive than a completely schemaless model

277 © D. Wong 2002 © D. Wong 2003 277 XML elements and Entities/Objects (1)  XML evolved from a document markup language (SGML) rather than a database language e.g. e.g. John lives on Main St Main St house number <number>123</number> In a remote township  Mixed data and text is a hindrance for database

278 © D. Wong 2002 © D. Wong 2003 278 XML elements and Entities/Objects (2)  XML elements are ordered whereas the attributes and the tuples in a relation are not  E.g. 123-4567 123-4567 Main St. Main St.  Main St.  Main St. 123-4567 123-4567 But, these tuples are equivalent: NumberStreet 123-4567 Main St. StreetNumber 123-4567

279 © D. Wong 2002 © D. Wong 2003 279 XML Element types  Complex types –Elements that contain sub-elements or carry attributes e.g. <message to=“acb@xyz.com" from=“aa@here.com" from=“aa@here.com" subject="XML"> XML basics … XML basics …  Simple types –elements that contain numbers (and strings, and dates, etc.) but do not contain any subelements –Built-in or author defined  Attributes always have simple types.

280 © D. Wong 2002 © D. Wong 2003 280 Document Type Definition (DTD)  It’s set of rules for constructing an XML document. –i.e. a grammar that specifies the schema of a legal XML document  Specified by XML authors: –As part of the document itself, or –Stored separate from the document. The document refers to the URL where the DTD is stored.  An XML document that conforms to it’s DTD is said to be valid.

281 © D. Wong 2002 © D. Wong 2003 281 DTD Structure  Basic DTD structure: <!DOCTYPE root-tag [ more elements ]>  The name of the DTD must coincide with the tag name of the root element of the document that conforms to that DTD.  One ELEMENT statement for each allowed tag, including the root tag  For each tag that can have attributes, the ATTLIST statement specifies the allowed attributes and their type.  Component of #PCDATA means simple text (i.e. no tags nested within)

282 © D. Wong 2002 © D. Wong 2003 282 DTD Structure (continue)  Component occurrence indicators: –* : zero or more –+ : one or more –? : zero or one time –| : either or e.g. (#PCDATA | (STREET CITY)  Empty tags:  Empty tags: e.g. e.g.  E.g. Figure 4.22 (2n. Ed.) – a DTD for movie stars  Figure 4.23 (2n. Ed.) – a document following DTD in Figure 4.22

283 © D. Wong 2002 © D. Wong 2003 283 Using a DTD  Two ways: 1.Include the DTD at the beginning of the document 2.In the document’s opening line, refer to the DTD that is stored separately in a file system accessible to the application that is processing the document. e.g.

284 © D. Wong 2002 © D. Wong 2003 284 Attribute Lists  Format: <!ATTLIST ElementName attributeName type [Modifier]... >

285 © D. Wong 2002 © D. Wong 2003 285 Some attribute types 1. CDATA 1. CDATA 2. ID  An attribute of type ID must have a unique value through out the document  E.g. if attr1 and attr2 are ID type, then and is illegal and is illegal  Imitate key in relational db 3. IDREF  An attribute of type IDREF must refer to a valid ID declared in the same document  Imitate foreign key 4. IDREFS  Refer to a list of valid IDs declared in the document

286 © D. Wong 2002 © D. Wong 2003 286 Some Modifier Types  #IMPLIED –The attribute is optional. –Can remain unspecified  #REQUIRED –The attribute is mandatory. –E.g. movieId ID #REQUIRED

287 © D. Wong 2002 © D. Wong 2003 287 Example: Use DTD to alleviate the order problem E.g.:  But, it's awkward!

288 © D. Wong 2002 © D. Wong 2003 288 XML Schema  A DDL for XML documents  Purpose: to define a class of XML documents  Instance document: an XML document that conforms to a particular schema (schema valid).  Instances and schemas may exist as: –documents in files –streams of bytes sent between applications –fields in a database record –collections of XML "Infoset "Information Items

289 © D. Wong 2002 © D. Wong 2003 289 Triggers  Available in Oracle and SQL99  Event-Condition-Action rules: define an action the db should take when some db-related events occurs

290 © D. Wong 2002 © D. Wong 2003 290 Trigger vs. other constraints 1. Triggers are awakened when certain events, specified by db programmer, occur. E.g. update, insert, delete to a relation; end of a transaction 2. Other constraints immediately prevent the event if the constraint is violated. For trigger, it's condition is tested when the event occurs, if the condition does not hold, then the action associated with the trigger will not happen (I.e. trigger will not be fired) 3. If the trigger condition is true, the action is performed by the DBMS (I.e. trigger fired). So, it's transparent to the user

291 © D. Wong 2002 © D. Wong 2003 291 Trigger Syntax CREATE [OR REPLACE] TRIGGER CREATE [OR REPLACE] TRIGGER {BEFORE | AFTER | INSTEADOF} {DELETE | INSERT | UPDATE [of column, …] } [OR {DELETE | INSERT | UPDATE [of column, …] } ON {tableName | viewName} [REFERENCING { OLD [AS], NEW [AS] …] FOR EACH {ROW | STATEMENT} [WHEN (condition) ] ;

292 © D. Wong 2002 © D. Wong 2003 292 Oracle Triggers  is PL/SQL block Simplest is : BEGIN END; In the PL/SQL action block, variables OLD and NEW are preceded by : e.g. :OLD  Follow the create trigger statement with a Dot (.) and then RUN to store the definition in the db  The action cannot change the relation that triggers the action, nor to relations connected to the triggering relation by a constraint e.g. FK constraint  Read 7.4.2 – 7.4.4

293 © D. Wong 2002 © D. Wong 2003 293 SQL3 Triggers  can be: 1. a single SQL statement 2. A SQL statements, separated by ; enclosed in a BEGIN END;

294 © D. Wong 2002 © D. Wong 2003 294 Constraints Summary 1. Primary Key declaration 2. Foreign Key – referential integrity constraint 3. Constraints within relations:  Attribute constraints: 1. NOT NULL; 2. CHECK  Tuple based CHECKs 4. Schema level constraints – SQL2 assertions (not in Oracle) 5. Triggers – Oracles's and SQL99's

295 © D. Wong 2002 © D. Wong 2003 295 Week 13

296 © D. Wong 2002 © D. Wong 2003 296 Java API for XML Processing (JAXP)  For processing XML data using applications written in Java  JAXP APIs - javax.xml.parsers package  Leverages 2 parser standards :  SAX (Simple API for XML Parsing)  parse the data as a stream of events, a serial-access mechanism that does element-by-element processing  DOM (Document Object Model) (easier to use)  build a tree structure of objects to represent the data  org.w3c.dom  Detail review of the enrollment example (files posted on the course web page as JAXP example)  http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JAXPIntro5.html http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JAXPIntro5.html

297 © D. Wong 2002 © D. Wong 2003 297 Ch. 7 Constraints and Triggers  Keys as constraint  Foreign key constraints and referential integrity  Constraints on attributes  Constraints on tuples  Assertions  Triggers

298 © D. Wong 2002 © D. Wong 2003 298 Keys as Constraint  One primary key per relation  Declared in CREATE TABLE  2 ways: 1.List with attribute (when only 1 attribute in key) e.g. snum char (5) PRIMARY KEY 2.Add to the list of items declared in the schema. E.g. CREATE TABLE supplier ( snum CHAR(3), sname VARCHAR(15), status INT, city VARCHAR(15) CONSTRAINT pk_supplier PRIMARY KEY(snum) );

299 © D. Wong 2002 © D. Wong 2003 299 Keys (continued)  Primary Key attribute cannot be NULL  2 tuples in R cannot agree on all of the attributes in the primary key. DBMS rejects violating insertions or updates.  Can also declare a key with UNIQUE –Can be used instead of PRIMARY KEY –Can have multiple UNIQUE in a table –UNIQUE attributes permit NULL  Key constraints are checked at insert and updates  Use of index created with key attributes for efficient key enforcement checks

300 © D. Wong 2002 © D. Wong 2003 300 Foreign-Key Constraints  To enforce referential integrity constraints  E.g. SPJ (type 1), S, P, J (type 2)  The reference attribute of the relation of type 2 must be declared PRIMARY or UNIQUE  Again 2 ways to declare in CREATE TABLE 1.Inline with declaration 2.Add to the list of items declared in the schema

301 © D. Wong 2002 © D. Wong 2003 301 FK as list of items declared in the schema e.g. CREATE TABLE spj ( snum CHAR(3), pnum CHAR(3), jnum CHAR(3), qty INT, CONSTRAINT fk_spj_1 FOREIGN KEY (snum) REFERENCES S(snum) DEFERRABLE INITIALLY DEFERRED, CONSTRAINT fk_spj_1 FOREIGN KEY (snum) REFERENCES S(snum) DEFERRABLE INITIALLY DEFERRED, CONSTRAINT fk_spj_2 FOREIGN KEY (pnum) REFERENCES P(pnum) DEFERRABLE INITIALLY IMMEDIATE, CONSTRAINT fk_spj_2 FOREIGN KEY (pnum) REFERENCES P(pnum) DEFERRABLE INITIALLY IMMEDIATE, CONSTRAINT fk_spj_3 FOREIGN KEY (jnum) REFERENCES J(jnum) -- this one just use default, i.e. not DEFERRABLE CONSTRAINT fk_spj_3 FOREIGN KEY (jnum) REFERENCES J(jnum) -- this one just use default, i.e. not DEFERRABLE);

302 © D. Wong 2002 © D. Wong 2003 302 FK – maintaining referential integrity  3 Policies: 1.Default : Reject violating Modifications 2.Cascade : changes to the referenced attributes are mimicked at the foreign key 3.Set Null : set the foreign key to null when the referenced attributes are changed (delete / update)  Declare for delete and update separately  Declare with the declaration of foreign key  Dangling tuples – violate referential integrity constraint for the foreign key. Dealt with the 3 policies.

303 © D. Wong 2002 © D. Wong 2003 303 Maintaining referential integrity - Example CREATE TABLE spj ( snum CHAR(3), pnum CHAR(3), jnum CHAR(3), qty INT, CONSTRAINT fk_spj_1 FOREIGN KEY (snum) REFERENCES S(snum) ON UPDATE CASCADE DEFERRABLE INITIALLY DEFERRED, CONSTRAINT fk_spj_1 FOREIGN KEY (snum) REFERENCES S(snum) ON UPDATE CASCADE DEFERRABLE INITIALLY DEFERRED, CONSTRAINT fk_spj_2 FOREIGN KEY (pnum) REFERENCES P(pnum) ON DELETE SET NULL DEFERRABLE INITIALLY IMMEDIATE, CONSTRAINT fk_spj_2 FOREIGN KEY (pnum) REFERENCES P(pnum) ON DELETE SET NULL DEFERRABLE INITIALLY IMMEDIATE, CONSTRAINT fk_spj_3 FOREIGN KEY (jnum) REFERENCES J(jnum) -- just use default CONSTRAINT fk_spj_3 FOREIGN KEY (jnum) REFERENCES J(jnum) -- just use default);

304 © D. Wong 2002 © D. Wong 2003 304 Constraints on Attributes and Tuples  Limits the values of some attributes.  Constraints within relations  Defined in relation's schema as: 1.Constraints on an attribute 2.Constraints on tuples

305 © D. Wong 2002 © D. Wong 2003 305 Constraints on the attributes  Not-null constraints e.g. pnum CHAR(3) NOT NULL, CONSTRAINT fk_spj_2 FOREIGN KEY (pnum) REFERENCES P(pnum), CONSTRAINT fk_spj_2 FOREIGN KEY (pnum) REFERENCES P(pnum),  Attribute-Based CHECK –Checked whenever any tuple gets a new value for this attribute (e.g. update, insert) –Reject violating operations –The check is expressed as a conditional expression e.g. qty INT CHECK (qty <= 10000), –Invisible to other relations (so, can be violated if the relations referenced in the condition are changed) –More examples in text Pp. 328 ex. 7.8, 7.9

306 © D. Wong 2002 © D. Wong 2003 306 Tuple-Based CHECK Constraints  Applies to one or more attributes of a table  Checked when a tuple is inserted or updated. Reject if it's a violating operation.  Invisible to other relations (so, can be violated if the relations referenced in the condition are changed)  E.g. CREATE TABLE MovieStar( name CHAR(30) PRIMARY KEY, gender CHAR(1), CHECK (gender = 'F' OR name NOT LIKE'Ms.%') );

307 © D. Wong 2002 © D. Wong 2003 307 Modification of Constraints  Only to constraints that are named.  Naming of constraints examples: 1.snum char (5) CONSTRAINT pk_supplier PRIMARY KEY, 2.CONSTRAINT fk_spj_1 FOREIGN KEY (snum) REFERENCES S(snum) DEFERRABLE, 3.Gender CHAR(1) CONSTRAINT NoAndro CHECK (gender in ('F', 'M')),

308 © D. Wong 2002 © D. Wong 2003 308 Modifications  Delete constraints ALTER TABLE DROP CONSTRAINT ;  Add constraint ALTER TABLE ADD CONSTRAINT ; e.g. ALTER TABLE student ADD CONSTRAINT PK_student PRIMARY KEY (studentkey);  Set constraint – to change a deferrable constraint from immediate to deferred and vise versa SET CONSTRAINT DEFERRED;

309 © D. Wong 2002 © D. Wong 2003 309 Assertions  An assertion is a schema level CHECK constraint.  Exist independent of any particular table.  It's condition can refer to any table in the db schema  Declaring an assertion: CREATE ASSERTION CREATE ASSERTION CHECK CHECK [constraint attributes];  constraint attributes: [NOT] DEFERRABLE; INITIALLY IMMEDIATE; INITIALLY DEFERRED  The constraint is violated if the condition is false  Any db modifications that causes a violation will be rejected

310 © D. Wong 2002 © D. Wong 2003 310 Triggers  Available in Oracle and SQL99  Event-Condition-Action rules: define an action the db should take when some db-related events occurs  Triggers are useful for: 1.To maintain complex integrity constraints 2.To audit changes made to table 3.To signal to other programs that changes were made to a table

311 © D. Wong 2002 © D. Wong 2003 311 Trigger vs. other constraints 1. Triggers are awakened when certain events, specified by db programmer, occur. E.g. update, insert, delete to a relation; end of a transaction 2. Other constraints immediately prevent the event if the constraint is violated. For trigger, it's condition is tested when the event occurs, if the condition does not hold, then the action associated with the trigger will not happen (I.e. trigger will not be fired) 3. If the trigger condition is true, the action is performed by the DBMS (I.e. trigger fired). So, it's transparent to the user

312 © D. Wong 2002 © D. Wong 2003 312 Trigger Syntax CREATE [OR REPLACE] TRIGGER CREATE [OR REPLACE] TRIGGER {BEFORE | AFTER | INSTEADOF} {DELETE | INSERT | UPDATE [of column, …] } [OR {DELETE | INSERT | UPDATE [of column, …] } ON {tableName | viewName} [REFERENCING { OLD [AS], NEW [AS] …] FOR EACH {ROW | STATEMENT} [WHEN (condition) ] ;

313 © D. Wong 2002 © D. Wong 2003 313 Oracle Triggers  is PL/SQL block Simplest is : BEGIN END; In the PL/SQL action block, variables OLD and NEW are preceded by : e.g. :OLD  Follow the create trigger statement with a Dot (.) and then RUN to store the definition in the db  The action cannot change the relation that triggers the action, nor to relations connected to the triggering relation by a constraint e.g. FK constraint  Read 7.4.2 – 7.4.4

314 © D. Wong 2002 © D. Wong 2003 314 SQL3 Triggers  can be: 1. a single SQL statement 2. A SQL statements, separated by ; enclosed in a BEGIN END;

315 © D. Wong 2002 © D. Wong 2003 315 Constraints Summary 1. Primary Key declaration 2. Foreign Key – referential integrity constraint 3. Constraints within relations:  Attribute constraints: 1. NOT NULL; 2. CHECK  Tuple based CHECKs 4. Schema level constraints – SQL2 assertions (not in Oracle) 5. Triggers – Oracles's and SQL99's

316 © D. Wong 2002 © D. Wong 2003 316 Ch. 8 - System Aspects of SQL  SQL in programming environments –Statement Level Interface  Embedded SQL, SQLJ  Dynamic SQL –PSM  Oracle’s PL/SQL –Call Level Interface  SQL environment

317 © D. Wong 2002 © D. Wong 2003 317 Host Language Interface  Host language: conventional language that applications are written in. e.g C, Java, C#, Visual Basic.Net  Statement level interface –Static: Embedded SQL, SQLJ –Dynamic SQL  Call level interface (CLI)  Use of CLI (e.g. JDBC) for static transaction is less efficient at run time than statement-level interfaces because CLI’s preparation and execution generally involve separate communications with the DBMS, which is costly.

318 © D. Wong 2002 © D. Wong 2003 318 Embedded SQL  Shared variables –Host language variables that can be read/written by SQL statement –Serves as interface between the host language and the SQL execution system –When used in embedded SQL statements, precede the variable with a : e.g. :name  Embedded SQL statements are prefixed with EXEC SQL  Preprocessor translate the EXEC SQL statements into function calls in host language, then compile, linked with SQL-related library to form executable code. Fig. 8.1 pp 351

319 © D. Wong 2002 © D. Wong 2003 319 Embedded SQL (continued)  SQLSTATE – a special variable (an array of 5 char) –Serves to connect the host-language program with the SQL execution system –updated each time a SQL library function is called –Some of the codes are:  ‘00000’ = no error  ‘02000’ = no tuple found

320 © D. Wong 2002 © D. Wong 2003 320 Embeddable SQL statements  Any SQL statement that does not return a result (I.e. not a select-from-where query) can be embedded. E.g. insert, delete, create, update.  select-from-where queries are not embeddable directly.  For connecting the result of queries with a host-language program, must use one of the following methods: 1.Use single-row select for query that produces a single tuple : select-into-from-where. Ref. fig. 8.3 pp 355 EXEC SQL SELECT A INTO :var FROM R WHERE ; 2.Use CURSOR for queries producing more than one tuple Fig. 8.4 pp 357

321 © D. Wong 2002 © D. Wong 2003 321 CURSOR - Ref. Fig. 7.4 pp.379 (Fig. 8.4 pp 357)  An object used to store the output of a query for processing in an application. It provides the mechanism to reference the current position in a result set, and to do positioned updates and deletes.  Declare cursor in embedded SQL by: EXEC SQL DECLARE CURSOR FOR EXEC SQL DECLARE CURSOR FOR  Open cursor before use: EXEC SQL OPEN EXEC SQL OPEN  Use Fetch statement to get the next tuple of the relation over which the cursor range EXEC SQLFETCH FROM INTO EXEC SQLFETCH FROM INTO SQLSTATE of ‘02000’ means no more tuples found  To close a cursor when done: EXEC SQL CLOSE

322 © D. Wong 2002 © D. Wong 2003 322 Cursor options  INSENSITIVE guarantee that changes to underlying relation made between one opening and closing of cursor will not affect the set of tuples already fetched.  FOR READ ONLY DBMS will prevent modifications to the underlying relation through this cursor  SCROLL The cursor may be accessed with any one of the : FETCH {NEXT / PRIOR/ FIRST / LAST /RELATVE n/ ABSOLUTE n}  ResultSet in JDBC is a cursor

323 © D. Wong 2002 © D. Wong 2003 323 SQLJ  ANSI standard Statement-level Interface to JAVA  A dialect of embedded SQL that can be included in JAVA program  Goal: To obtain the run-time efficiency of embedded SQL for Java applications while retaining the advantage of accessing DBMS through JDBC  Embedded SQLJ constructs are replaced by calls to an SQLJ run-time package, which access database using JDBC.  Benefit: the pre-compiler (translator in Oracle) can check SQL syntax and the number and types of arguments and results.

324 © D. Wong 2002 © D. Wong 2003 324 Differences between SQLJ, Embedded SQL and JDBC 1. SQLJ supports essentially SQL-92, much more portable across DBMS vendors. For embedded SQL, each DBMS vendor supports its own proprietary version of SQL. 2. An SQLJ clause in a JAVA program begin with #SQL instead of EXEC SQL, and can contain select-from- where statement inside { } 3. Any JAVA variables can be included as a parameter in an SQL statement (prefix the variable with : ), same as in embedded SQL 4. In SQLJ, a query returns an SQLJ interator object instead of a ResultSet object (as in JDBC). But, it’s similar to ResultSet in that they provide a cursor mechanism. 5. Both SQLJ statement and JDBC calls can be included in the same Java program

325 © D. Wong 2002 © D. Wong 2003 325 Oracle’s SQLJ script  To use: sqlj file.sqlj  Many command lines options are availabe, refer to Oracle developer guide at http://otn.oracle.com/tech/java/sqlj_jdbc/pdf/a96655.pdf http://otn.oracle.com/tech/java/sqlj_jdbc/pdf/a96655.pdf  Invokes translator to preprocess java programs with SQLJ clauses (suffix of those files can be.sqlj,.java). Translator generates.java file which contains the JDBC calls for the SQL statements  Invokes javac to compile  Invokes JVM to execute

326 © D. Wong 2002 © D. Wong 2003 326 Dynamic SQL  Used when the SQL statement is not known at compile time  E.g. an application that prompt the user for an SQL query, read, and then execute the query (can you think of an application?)  Host language program must instruct SQL to: 1.Take a string and turn it into an executable SQL statement: EXEC SQL PREPARE sqlvar FROM sharedVar; EXEC SQL PREPARE sqlvar FROM sharedVar; 2.Execute the prepared statement: EXEC SQL EXECUTE sqlvar; EXEC SQL EXECUTE sqlvar;

327 © D. Wong 2002 © D. Wong 2003 327 Dynamic SQL (continued)  Steps 1 and 2 can be combined by: EXEC SQL EXECUTE IMMEDIATE sharedVar;  The 2 steps approach is beneficial if a prepared statement is execute multiple times.  Ref. Fig. 8.7 pp. 368

328 © D. Wong 2002 © D. Wong 2003 328 Week 14

329 © D. Wong 2002 © D. Wong 2003 329 Persistent, Stored Modules (PSM) (Ref. 8.2 )  A way to create and store procedures or functions with a database schema  The procedure/functions can be used in SQL queries or other SQL statements to perform within the database computations that cannot be expressed in SQL query language  Like a simple, general-purpose language  SQL/PSM standard (PSM-96). Commercial DBMS offers such capabilities with variations (e.g. Oracle, MS SQL)

330 © D. Wong 2002 © D. Wong 2003 330 PSM Modules  Collection of function and procedure definitions, temporary relation declarations, and other declarations. (PSM modules in Oracle are Packages)  Parameters: mode-name-type triples  Modes: IN, OUT, INOUT PSM Procedure PSM Function CREATE PROCEDURE (parameters) local declarations local declarations procedure body; procedure body; CREATE FUNCTION (parameters) return return local declarations local declarations procedure body; procedure body;

331 © D. Wong 2002 © D. Wong 2003 331 Statement Forms in PSM 1. CALL –statement (for procedures only) CALL ( ); i.From a host-language program: e.g. EXEC SQL CALL proc(:x, 3); // x is a shared variable ii.As a statement of another PSM function or procedure iii.As an SQL command in an SQL client (e.g. SQLPLUS). E.g CALL proc (1, 3)

332 © D. Wong 2002 © D. Wong 2003 332 Statement Forms in PSM (continued) 2. RETURN statement (for functions only) 3. Declaration of local variables (precede executable statements): DECLARE ; 4. Assignment statement: SET = ; 5. Statement group in BEGIN … END block 6. Statement label - labelName: 7. Branching statement (ref. Fig. 8.9, 8.10 )

333 © D. Wong 2002 © D. Wong 2003 333 Statement Forms in PSM (continued 2) 8. Queries (select-from-where): 9. Loops a)LOOP END LOOP; b)Only to iterate over a cursor: FOR AS CURSOR FOR DO END FOR c)WHILE DO END WHILE; d)REPEAT UNTIL END REPEAT;

334 © D. Wong 2002 © D. Wong 2003 334 PL/SQL of Oracle  Oracle’s procedural language (PL). It’s a superset of SQL.  Used to create stored procedures/functions and packages, to trigger database events, or to add programming logic to SQL commands  Ref. : Ullman and students’s introduction http://www-db.stanford.edu/~ullman/fcdb/oracle/or-plsql.html

335 © D. Wong 2002 © D. Wong 2003 335 Call-level Interface (CLI)  The application is written entirely in the host language.  SQL statements are the values of string variables constructed at run time  The string variables are passed as arguments to host language procedures provided by the CLI library of functions.  No special syntax, no pre-compiler is needed.  System independence – in principle!  SQL 99's CLI standard is an adaptation of ODBC (Open Database Connectivity) and the host language is C  SQL/CLI in Java is JDBC (discussed in week 9)

336 © D. Wong 2002 © D. Wong 2003 336 The SQL Environment  It's the framework in which data may exist and SQL operations on data may be executed. I.e. a DBMS running at an installation.  SQL standard of elements hierarchy: Fig. 8.15 2 nd ed 1.Schema – collection of tables, views, assertions, triggers, PSM, modules, etc. 2.Catalogs – collections of schemas 3.Clusters – collections of catalogs. Each user has an associated cluster.

337 © D. Wong 2002 © D. Wong 2003 337 The SQL Environment (continued)  SQL clients and servers – processes that runs of the same or different hosts. –Server : support operations on database elements –Client : allow a user to connect to a server and operate on a database (e.g. SQLPLUS)  Connection – a "connection" must be made between the server and the client by executing the CONNECT statement on the client host. Default connection is made by an SQL client (e.g. SQLPLUS).  Session – SQL operations that are performed while a connection is active. Each session has a current catalog, schema, and an authorized user. Reminder: temporary elements such as view exists only within the environment of a session.

338 © D. Wong 2002 © D. Wong 2003 338 The SQL Environment (continued 2)  Modules –An SQL term for an application program –3 kinds of modules in SQL standard: 1. Generic SQL interface, e.g. SQLPLUS 2. Embedded SQL 3. True modules, e.g. PSM –SQL agent – a SQL module in execution Ref. Fig. 8.17 pp. 384

339 © D. Wong 2002 © D. Wong 2003 339  SQL Transactions –Example in JDBC  Security and User Authorization in SQL  Object Oriented Model –ODL –OQL  Object Relational Systems  Recursion in SQL

340 © D. Wong 2002 © D. Wong 2003 340 Transactions in SQL (8.6 - 2 nd ed.)  Concerns with enforcing the ACID properties of transactions  Review: ACID properties of “proper” execution: –Atomicity : All of the updates of a transaction are successful, or no update take place –Consistency: Each transaction should leave the database in a consistent state –Isolation: Each transaction, when executed concurrently with other transactions, should have the same effect as if it had been executed by itself (serializable behavior) –Durability: Once a transaction has completed successfully, its changes to the database should be permanent. Even serious failures should not affect the permanence of a transaction.

341 © D. Wong 2002 © D. Wong 2003 341 Serializability and Atomicity  Serializable function execution: If the function executions behave as if they were run serially, even though their execution may overlap in time.  Example of problem with concurrent function executions that are not serialized: Ex. 8.26, Fig. 8.22 and Fig. 8.23 pp. 398  Example of problem when failure (system, network, power) occurs during a database operation: Ex. 8.27, Fig. 8.24 pp. 400

342 © D. Wong 2002 © D. Wong 2003 342 Transactions  To solve the serializability and atomicity problem.  Def.: A collection of one or more operations on the database that must be executed atomically, (i.e. either all are done or none are) and in a serializable manner (i.e. isolation).  End of a SQL transaction is marked by either operations: 1.COMMIT - the database will be permanently updated 2.ROLLBACK – no changes by the transaction appear in the database (abort the transaction)

343 © D. Wong 2002 © D. Wong 2003 343 Transactions (continued)  In query interface (e.g. SQLPLUS): Transactions are single queries or modification statements.  In program interface (e.g. JDBC): 1.Single SQL statement: use automatic commit mode. A transaction:  starts when the statement begins execution  ends when an automatic commit or rollback is completed 2.More than 1 SQL statement: – Disable auto-commit – After all SQL statements in a transaction are executed, call commit; or rollback if error occurs

344 © D. Wong 2002 © D. Wong 2003 344 Deferred constraints  When the commit method is executed, if there are deferred constraints that need to be checked, and these constraints are now violated, then the transaction is not committed.  Instead, the transaction is rolled back and the error is reported via the SQLSTATE variable.  In JDBC, this SQLSTATE is reported via the SQLException object's getSQLState() method.

345 © D. Wong 2002 © D. Wong 2003 345 Code segment to show transaction control in JDBC: try { //... add code here for making connection... PreparedStatement updateSales = con.prepareStatement("update COFFEES " + "set SALES = ? where COF_NAME like ?"); PreparedStatement updateTotal = con.prepareStatement( "update COFFEES " + "set TOTAL = TOTAL + ? where COF_NAME like ?") ; con.setAutoCommit(false); updateSales.setInt(1, 50);updateSales.setString(2, "Colombian"); updateSales.executeUpdate();// first update updateTotal.setInt(1, 50);updateTotal.setString(2, "Colombian"); updateTotal.executeUpdate();// second update con.commit(); con.setAutoCommit(true); } catch(SQLException ex) { if (con != null) { try { System.err.print("Error detected! Transaction is being rolled back!"); con.rollback(); } catch(SQLException excep) { System.err.print("SQLException: " + excep.getMessage()); } } // end if there is a connection }

346 © D. Wong 2002 © D. Wong 2003 346 Transaction Isolation Levels  Determine whether and how a transaction will be affected by concurrent transactions – transactions from different SQL sessions that are accessing the same data.  SQL isolation levels are to address: 1.Dirty Read: a read of dirty data. Dirty data is data written by a transaction that has not yet committed. 2.Non-repeatable Read: One transaction reads a row. Another deletes or modifies it and COMMIT before the first one does. Now, if the first transaction perform the same read again will get different result. 3.Phantom tuples: tuples that are inserted into the database by another transaction while one's transaction is running.

347 © D. Wong 2002 © D. Wong 2003 347 Isolation levels in JDBC   The 5 isolation levels: 1. TRANSACTION_NONE - transactions not supported (This level is not in the SQL standard) 2. TRANSACTION_READ_UNCOMMITTED - allows dirty reads, nonrepeatable reads, and phantom reads 3. TRANSACTION_READ_COMMITTED - prevents dirty reads, but does not prevent nonrepeatable reads, and phantom reads 4. TRANSACTION_REPEATABLE_READ - disallow dirty reads, nonrepeatable reads, but allow phantom reads 5. TRANSACTION_SERIALIZABLE - disallow dirty reads, nonrepeatable reads, and phantom reads (The highest level, the default)  Connection methods to get and set them: 1.getTransactionIsolation() - to find out the transaction isolation level the database is set 2.setTransactionIsolation() - to set the appropriate isolation level

348 © D. Wong 2002 © D. Wong 2003 348 Choosing Isolation Levels  A choice of compromise between performance and a slightly inconsistent view of data (because data is being updated concurrently by other users).  Ref. Ex. 8.32, 8.33 pp. 408

349 © D. Wong 2002 © D. Wong 2003 349 DBMS Techniques to enforce ACID  Locking – granularity of locks is important. Locks are obtained at the beginning of a transaction. Locks are released at the end of commit or rollback.  Logging – write a log to nonvolatile storage. Assure durability.  Transaction Commitment – for durability and atomicity, transactions are computed “tentatively”, recorded, but no changes are made to the db until the transaction gets committed. Changes are copied to the log, then copied to db.

350 © D. Wong 2002 © D. Wong 2003 350 J2EE Transaction Management in Enterprise JavaBeans (EJBs) Ref. JavaTM 2 Enterprise Edition Developer's Guide

351 © D. Wong 2002 © D. Wong 2003 351 Container-Managed Transactions (see note page of this slide) Description Code does not include statements that begin and end the transaction Immediately before an EJB method starts - transaction begins Just before the method exits - commits Each method can be associated with a single transaction Prohibited methods, e.g.: commit, setAutoCommit, and rollback methods of java.sql.Connection Limitation: When a method is executing, it can be associated with either a single transaction or no transaction at all

352 © D. Wong 2002 © D. Wong 2003 352 Bean Managed Transaction (see note page of this slide) Session bean code invokes methods that mark the boundaries of the transaction - setAutoCommit(); commit(); rollback(); An entity bean may not have bean-managed transactions public void ship (String productId, String orderId, int quantity) { try { con.setAutoCommit(false); updateOrderItem(productId, orderId); updateInventory(productId, quantity); con.commit(); } catch (Exception ex) { try { con.rollback(); throw new EJBException("Transaction failed: " + ex.getMessage()); } catch (SQLException sqx) { throw new EJBException("Rollback failed: " + sqx.getMessage()); } } } Ref. Java TM 2 Enterprise Edition Developer's Guide, JDBC Transaction Example:

353 © D. Wong 2002 © D. Wong 2003 353 Week 15

354 © D. Wong 2002 © D. Wong 2003 354 Security and User Authorization in SQL 8.7 pp. 410  Authorization ID = user name  Special authorization ID: PUBLIC  Privileges for: SELECT, INSERT, UPDATE, DELETE, REFERENCE, USAGE, TRIGGER, EXECUTE, UNDER  For SELECT, INSERT, UPDATE, may also specify on attribute level  Privileges are needed for relations in the subqueries also. e.g. Fig. 8.25 pp 411

355 © D. Wong 2002 © D. Wong 2003 355 Creating privileges  Owner of schema or modules has all privileges  Establish ownership at: 1.When a schema is created. 2.When a session is initiated by a CONNECT statement. e.g. CONNECT TO ABC_server AS conn1 AUTHORIZATION smith; 3.When a module is created, use an optional AUTHORIZATION clause

356 © D. Wong 2002 © D. Wong 2003 356 Granting privileges  Owner of a relation has GRANT privilege.  If you have the "GRANT" privilege to a set of privileges, you may grant them to any user. GRANT ON GRANT ON TO [WITH GRANT OPTION] e.g. GRANT SELECT, INSERT ON Studio TO kirk, picard WITH GRANT OPTION;-- by Janeway GRANT SELECT, INSERT ON Studio TO sisko; -- by picard GRANT SELECT, INSERT(name) ON Studio TO sisko; -- by kirk  Grant diagram e.g. Fig. 8.26 pp. 417

357 © D. Wong 2002 © D. Wong 2003 357 Revoking Privileges  Privileges can be revoked: REVOKE [GRANT OPTION FOR] ON FROM {CASCADE | RESTRICT} e.g. REVOKE SELECT, INSERT ON Studio FROM picard CASCADE ;  If A has been given a privilege by several different people on the same element, then all of them have to revoke in order for A to lose the privilege  If A granted privilege P to B, who granted P to C, then A revokes P from B will also revoke P from C. e.g. Fig 8.29 pp 420

358 © D. Wong 2002 © D. Wong 2003 358 Object-Oriented Data Model  ODMG –Object Database Management Group –Deals with OO standard for database –Also deals with ORDBMS (Object Relational DBMS)  Major parts of ODMG standard: –ODL: Object Definition Language, how to specify the db schema –OQL: the SQL-like Object Query Language –Host language binding: how to use ODL and OQL from within procedural languages. The standard define bindings for C++, SmallTalk, and Java. In ODMG, the host language also serves as the object manipulation language.

359 © D. Wong 2002 © D. Wong 2003 359 ODMG database management system  Application is written in a host language e.g. C++, Java  In order to access the db, the application must be linked with the ODBMS libraries and with the code that implements its class methods.  Much of the code that manipulates objects is part of the database itself.  Each class has a set of methods. Method signatures are specified in the schema using ODL.  The code for these methods is stored on the database server.  ODBMS invokes the appropriate code whenever a method is called.  OODMG database data is modified directly in the host language e.g. Stud.Name = "Joe";// Stud contains the oid of a // persistent Student object

360 © D. Wong 2002 © D. Wong 2003 360 Architecture of an ODMG database Schema Spec. in ODL(Embedded in C++, Java, etc) Source code for class methods in host language (C++, Java, …) Host language compiler Linker Method Implementation Binaries Stored in DBMS ODL Preprocessor Metadata Object Data ODBMS Software ODBMS Libraries Method Implementation Obj. code Information stored at the Server Data Access Ref. "Databases and Transaction Processing" – Lewis, Addison Wesley

361 © D. Wong 2002 © D. Wong 2003 361 Structure of ODMG Applications ODBMS ODBMS library Method implementation binaries stored in DBMS Application source code in host language Host language compiler Application Object code Linker Executable code Ref. "Databases and Transaction Processing" – Lewis, Addison Wesley

362 © D. Wong 2002 © D. Wong 2003 362 Object Definition Language (ODL)  Conceptual model to describe the attributes, methods, and relationships of each object type (class), including it's inheritance properties.  ODL classes describes 3 kinds of elements: 1.Attributes: values associated with the object 2.Relationship: connection between the object itself and other objects 3.Methods: functions that may be applied to objects of the class. Methods are specified by it's signature: name, arguments (names, order, and type), return value type, name of any exceptions it can raise. e.g. Fig. 4.2 pp137

363 © D. Wong 2002 © D. Wong 2003 363 Object Definition Language (ODL) (continued)  Class declaration  Class include: 1.Class Name 2.Key declaration(s). Optional. 3.Extent Declaration = name for the set of currently existing objects of a class (I.e. relation instance in relational model) 4.Element declarations: attributes, relationships, methods class [(extent names)] { }

364 © D. Wong 2002 © D. Wong 2003 364 Object Definition Language (ODL) (continued 2)  Attribute declaration (non-objects): attribute ; e.g. 1 attribute string name; e.g. 2 attribute Struct Addr{ string street, string city} address;  Relationship (and inverse relationship) declaration (objects): relationship [rangetype] inverse className:: ; e.g. relationship Set stars inverse Star::starredIn;

365 © D. Wong 2002 © D. Wong 2003 365  Method declaration (arguments) raises ( ); (arguments) raises ( ); e.g. 1: void lengthInhours() raises (noLengthFound); e.g. 2: void starName(out Set ) ;  Arguments: in : read-only out: for returning values inout: for both

366 © D. Wong 2002 © D. Wong 2003 366 ODL Relationships  Only binary relationships supported –Use a connecting class to represent multiway relationships Fig. 2.9 pp. 34.  Relationships are defined in inverse pairs. Fig. 4.3 pp 140 1.Many-many: have a set type of class in each direction 2.Many-one: a set type for the one, and a simple class name for the many 3.One-one: simple class name in both

367 © D. Wong 2002 © D. Wong 2003 367  Subclass (S is a subclass of D) Class C extends D { class C's declarations } e.g. class Cartoon extends Movie { relationship Set voices; }  Multiple inheritance (separate the super classes by : in the extend declaration) e.g. class CartoonMurderMystery extends MurderMystery : Cartoon  Name conflict resolutions with Multiple inheritance pp. 151

368 © D. Wong 2002 © D. Wong 2003 368 ODL data types  Basis: 1.Atomic type: integer, float, characters, string, boolean, enum 2.Class names  Structured types: 1.Set: Set // finite sets of elements of type T 2.Bag: Bag // finite bags of element type T 3.List: List // finite lists of 0 or more elements T 4.Array: Array // T = type, i = no. of elements 5.Dictionary: Dictionary, T is key type, S is range type. Each pair has unique key value. 6.Structures : Struct N { field1, …}

369 © D. Wong 2002 © D. Wong 2003 369 Keys declaration in ODL  Optional because each object is identified by an internal OID  May declare one or more keys in the extent declaration e.g. class Movie (extent Movies key (title, year)) { attribute string title; attribute integer year; …}

370 © D. Wong 2002 © D. Wong 2003 370 ODL to Relational Design  Invent a new attribute to serve as key when there is no key in the ODL design  ODL attributes that are not atomic are converted into relation attributes that usually are redesigned with normalization  Methods are not converted to relational design. But can have methods in Object Relational design

371 © D. Wong 2002 © D. Wong 2003 371 Object-Relational DB (ORDB)  SQL-99 adopted a limited subset of the object relational model  ORDBMS is a conservative extension to the existing RDBMS.  In general, ORDB consists of: –A set of relations (which can be viewed as classes) –Each relation consists of a set of tuples (which can be viewed as instances of the class that represents the relation) –Each tuple is of the form (oid, val) where oid is an object id and val is a tuple value whose components can be arbitrary values (e.g. primitive values, sets of tuples, and references to other objects)

372 © D. Wong 2002 © D. Wong 2003 372 ORDB, ODB, RDB  Difference between ORDB and ODB –In ORDB, the top-level structure of each object instance is always a tuple. In ODB, top-level structure can be an arbitrary value.  Difference between ORDB and RDB: –RDB tuple components must be primitive values –ORDB tuple components can be arbitrary values

373 © D. Wong 2002 © D. Wong 2003 373 Oracle Object example create type ADDRESS_TY as object (Street VARCHAR2(50), CityVARCHAR2(25), CityVARCHAR2(25), StateCHAR(2), StateCHAR(2), ZipNUMBER); ZipNUMBER); create type PERSON_TY as object (NameVARCHAR2(25), BirthDate DATE; BirthDate DATE; AddressADDRESS_TY AddressADDRESS_TY member function AGE_DAYS (BirthDate IN DATE) return NUMBER); member function AGE_DAYS (BirthDate IN DATE) return NUMBER);

374 © D. Wong 2002 © D. Wong 2003 374 Oracle Object example (continued)  Defining methods for user defined types using PL/SQL: Create type body PERSON_TY as Member function AGE_DAYS (BirthDate DATE) return NUMBER is begin RETURN ROUND(SysDate – BirthDate); end; -- if there are more methods to the data type, may define here end;/

375 © D. Wong 2002 © D. Wong 2003 375 Oracle Object example (continued 2)  Create table with user defined abstract data types: create table CUSTOMER (Customer_ID NUMBER, Person PERSON_TY); Person PERSON_TY);  Use constructors for inserting data: insert into CUSTOMER values (1, PERSON_TY('Joe Smith', '01- JAN-90', ADDRESS_TY('10 Spring ST', 'BHM', 'AL', 35110)));  Use path names to access the attributes: SELECT Person.Address.Street FROM CUSTOMER; SELECT Person.AGE_DAYS(Person.BirthDate) FROM CUSTOMER; UPDATE CUSTOMER SET Person.Address.City = 'Birmingham' WHERE Person.Address.City = 'BHM';

376 © D. Wong 2002 © D. Wong 2003 376 Object-Orient Analysis and Design  Normalization in relational model relates each attribute to its primary key e.g. The following is in 3NF: create table CUSTOMER (Customer_ID NUMBER, (Customer_ID NUMBER, Name VARCHAR2(25), Name VARCHAR2(25), BirthDate DATE; BirthDate DATE; Street VARCHAR2(50), Street VARCHAR2(50), CityVARCHAR2(25), CityVARCHAR2(25), StateCHAR(2), StateCHAR(2), ZipNUMBER ); ZipNUMBER );  For OO, further group related columns into abstract data types (ADT) (e.g. ADDRESS_TY ) for reuse.  Then look for relationships among ADTs to determine if nesting is appropriate (e.g. PERSON_TY);


Download ppt "© D. Wong 2002 © D. Wong 2003 1 CS610 / CS710 Database Systems I Daisy Wong."

Similar presentations


Ads by Google