Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Databases II (Fall 2009) Professor: Iluju Kiringa SITE 5072.

Similar presentations


Presentation on theme: "1 Databases II (Fall 2009) Professor: Iluju Kiringa SITE 5072."— Presentation transcript:

1 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca http://www.site.uottawa.ca/~kiringa SITE 5072

2 2 Review of Databases I Chapters 2-5

3 3 What is a DBMS?  A centralized database is a large collection of integrated data.  Organizations face large quantities of data that must be efficiently managed.  Many store GBs, even TBs of data  Some scientific applications store PBs of data !  A Database Management System (DBMS) is a software package designed to store and manage databases.

4 4 Content of « Databases I »  Foundations of DBMSs (Chap. 2-5, 19): :  Conceptual design - Chap. 2: Input: requirements of an application Output: ER diagrams (i.e., ER model)  Logical design – Chap. 3: Input: ER diagrammes Output: Relational model  Normalization – Chap. 19: Input: Relational model Output: Relational model in normal forms  Relational algebra and calculus – Chap. 4  SQL – Chap. 5  Application development (Chap. 6-7)  Classic applications: Embeded SQL, JDBC, SQLJ, stored procedures  Internet applications : HTTP, HTML, XML, 3-tier architecture

5 5 Content of « Databases I » (Cont’d)  Storage and Indexing (Chap. 8-11):  Disks and files – Chap. 9: Memory hierarchy Disk and file management  Tree index – Chap. 10: ISAM tree B+ tree  Hasch index – Chap. 11: static extendible linear

6 6 Content of « Databases II »  Query evaluation (Chap. 12-15)  External sort - Chap. 13  Evaluation of relational operators -- Chap. 14  System R: a sample evaluation algorithm – Chap. 15  Transactions (Chap. 16-18)  Concurrency control – Chap. 17  Recovery – Chap. 18  Advanced topics (Chap. 22,25, 26, 23 ou 27)  Distributed databases – Chap. 22  Data Warehousing – Chap. 25  Data Mining Chap. 26

7 7 Overview of Database Design  Requirements analysis : Find out what users want to do with the database  Conceptual design : Use the output of RA to develop a high-level description of the data to be stored, along with their constraints. Output of CD usually is an ER-diagram.  Logical design: Choose a DBMS and map the conceptual schema (ER-diagram) into the data model of the DBMS. Output of this step is the logical schema of the data.  Schema refinement: Analyze the logical schema, identify potential problems in it, and fix these by refining the logical schema using known normal forms.  Physical design: Consider expected workloads that the database will support to further refine the design in order to meet desired performance criteria. Output here is the physical schema.

8 8 ER Model  The model  Basic elements: Entity: real-world object Relationship: association between entities Attributes: description of an entity or a relationship  Advanced elements: Constraints: qualification of a relationship Weak entity : Identifiable only via another entity Hierarchy : similar to OO-languages Agregation: sort of macros  Conceptual design using the ER model  Should a given concept be modelled as an entity or an attribute?  Should a given concept be modelled as an entity or a relationship?  Should a given concept be modelled as a binary or as a ternary relationship?  etc

9 9 Relational Database Concepts  Relation: made up of 2 parts:  Instance : a table, with rows and columns. #Rows = cardinality, #fields = degree / arity.  Schema : specifies name of relation, plus name and domain (type) of each column (attribute).  Can think of a relation as a set of rows or tuples (i.e., all rows are distinct), where each tuple has the same arity as the relation schema.  Relational database: a set of relations, each with distinct name.  Relational DB schema: set of schemas of relations in the DB.  Relational DB instance: set of relation instances in the DB.

10 10 Sample Relation  Cardinality = 3, arity = 5, all rows distinct.  Commercial systems allow duplicates.  Order of attributes may or may not matter!  Do all columns in a relation instance have to be distinct? Depends on whether they are ordered or not.  Schema : Students(sid: string, name: string, login: string, age: integer, gpa: real). Instance :

11 11 Creation and Alteration of Relations CREATE TABLE Students CREATE TABLE Enrolled (sid: CHAR(20), name: CHAR(20), cid: CHAR(20), login: CHAR(10), grade: CHAR(2)) age: INTEGER, gpa: REAL) DROP TABLE StudentsALTER TABLE Students ADD COLUMN firstYear: integer INSERT INTO Students (sid, name, login, age, gpa) VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2) DELETE FROM Students S WHERE S.name = ‘Smith’

12 12 Primary Key and Foreign Key in SQL CREATE TABLE Enrolled (sid CHAR (20) cid CHAR(20), grade CHAR (2), PRIMARY KEY (sid,cid) ) CREATE TABLE Enrolled (sid CHAR (20) cid CHAR(20), grade CHAR (2), PRIMARY KEY (sid), UNIQUE (cid, grade) ) CREATE TABLE Enrolled (sid CHAR (20), cid CHAR(20), grade CHAR (2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students )

13 13 Logical DB Design: ER to Relational  The ER model represent the initial, high-level database design.  The task is to generate a relational database schema that is as close as possible to the ER model.  The mapping is approximate since it is hard to translate all the constraints of the ER model into an efficient logical (relational) model.

14 14 Formal Relational Query Languages  Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation:  Relational Algebra : More operational, very useful for representing execution plans.  Relational Calculus : Lets users describe what they want, rather than how to compute it (Non- operational, declarative ).

15 15 Relational Algebra  Basic operations:  Selection ( ) Selects a subset of rows from relation.  Projection ( ) Deletes unwanted columns from relation.  Cross-product ( ) Allows us to combine two relations.  Set-difference ( ) Gives tuples in 1 st rel., but not in 2 nd rel.  Union ( ) Gives tuples in rel. 1 and in rel. 2.  Additional operations:  Intersection, join, division, renaming: Not (theoretically) essential, but (practically) very useful.  Since each operation returns a relation, operations can be composed ! (Algebra is “closed”.)

16 16 Cross-Product  Each row of S1 is paired with each row of R1.  Result schema has one field per field of S1 and R1, with field names `inherited’ if possible.  Conflict : Both S1 and R1 have a field called sid.  Renaming operator :

17 17 Joins  Condition Join :  Result schema same as that of cross-product.  Fewer tuples than cross-product, might be able to compute more efficiently  Sometimes called a theta-join.

18 18 Joins (Cont’d)  Equi-Join : A special case of condition join where the condition c contains only equalities.  Result schema similar to cross-product, but only one copy of fields for which equality is specified.  Natural Join : Equijoin on all fields having the same name in both relations.

19 19 Relational Calculus  Has two flavors:  Tuple relational calculus (TRC)  Domain relational calculus (DRC).  Has variables, constants, comparison ops, logical connectives, and quantifiers.  TRC : Variables range over (i.e., get bound to) tuples.  DRC : Variables range over domain elements (= field values).  Both TRC and DRC are simple subsets of first-order logic.  Expressions in the calculus are called formulas. An answer tuple is essentially an assignment of constants to variables that make the formula evaluate to true.

20 20 Tuple Relational Calculus  Query has the form:  Answer includes all tuples t that make the formula p(t) be true.  Formula is recursively defined, starting with simple atomic formulas (getting tuples from relations or making comparisons of values), and building bigger formulas using the logical connectives.

21 21 TRC Formulas  Atomic formula: , or R.a op S.b, or R.a op constant  op is one of  Formula:  an atomic formula, or , where p and q are formulas, or , where variable R is free in p(R), or , where variable R is free in p(R)  The use of quantifiers and is said to bind R.  A variable that is not bound is free.

22 22 Overview: Features of SQL  Data definition language: used to create, destroy, and modify tables and views.  Data manipulation language: used to pose queries, and to insert, delete, and modify rows.  Triggers and advanced integrity constraints: used to specify actions that the DBMS will execute automatically.  Embeddded SQL: allows SQL to be called from a host language, or  Dynamic SQL : allows run-time creation and execution of queries.

23 23 Syntax of Basic SQL Query  relation-list A list of relation names (possibly with a range-variable after each name).  target-list A list of attributes of relations in relation-list  qualification Comparisons (Attr op const or Attr1 op Attr2, where op is one of ) combined using AND, OR and NOT.  DISTINCT is an optional keyword indicating that the answer should not contain duplicates. SELECT [DISTINCT] target-list FROM relation-list WHERE qualification

24 24 Sample Query and Conceptual Evaluation SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103

25 25 Aggregate Operators  Significant extension of relational algebra COUNT (*) COUNT ( [ DISTINCT ] A) SUM ( [ DISTINCT ] A) AVG ( [ DISTINCT ] A) MAX (A) MIN (A) SELECT AVG (S.age) FROM Sailors S WHERE S.rating=10 SELECT COUNT (*) FROM Sailors S SELECT AVG ( DISTINCT S.age) FROM Sailors S WHERE S.rating=10 SELECT S.sname FROM Sailors S WHERE S.rating= ( SELECT MAX (S2.rating) FROM Sailors S2) single column SELECT COUNT ( DISTINCT S.rating) FROM Sailors S WHERE S.sname=‘Bob’

26 26 Queries With GROUP BY and HAVING  The target-list contains (i) attribute names (ii) terms with aggregate operations (e.g., MIN ( S.age )).  The attribute list (i) must be a subset of grouping-list. Intuitively, each answer tuple corresponds to a group, and these attributes must have a single value per group. (A group is a set of tuples that have the same value for all attributes in grouping-list.) SELECT [DISTINCT] target-list FROM relation-list WHERE qualification GROUP BY grouping-list HAVING group-qualification

27 27 Find the age of the youngest sailor with age 18, for each rating with at least 2 such sailors  Only S.rating and S.age are mentioned in the SELECT, GROUP BY or HAVING clauses; other attributes ` unnecessary ’.  2nd column of result is unnamed. (Use AS to name it.) SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT (*) > 1 Answer relation

28 28 Summary SQL  is more declarative than earlier, procedural query languages.  is relationally complete; in fact, significantly more expressive power than relational algebra.  In practice, users need to be aware of how queries are optimized and evaluated for best results.  has any alternative ways to write a query; optimizer should look for most efficient evaluation plan.  allows specification of rich integrity constraints.  Provides triggers to respond to changes in the database.


Download ppt "1 Databases II (Fall 2009) Professor: Iluju Kiringa SITE 5072."

Similar presentations


Ads by Google