Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 1: Relational Data Models

Similar presentations


Presentation on theme: "Lecture 1: Relational Data Models"— Presentation transcript:

1 Lecture 1: Relational Data Models
CM036 Advanced Databases Lecture 1: Relational Data Models

2 Content 1 Relational Models, Languages and their Use in databases
2 The Relational Algebra 3 The Relational Calculus CM036 Advanced Databases Lecture 1: Relational Models

3 1.1 Relational and other data models
Relational models (relational algebra, relational calculus, predicate calculus) Hierarchical (object-relational) Object-oriented CM036 Advanced Databases Lecture 1: Relational Models

4 1.2 Relational Languages and their Use
Data manipulation language (DML) Use: Populates, updates, and queries relational databases Example: relational algebra, SQL DML Data definition language (DDL) Use: Specifies the data structures and defines the relational schema Example: relational calculus, SQL DDL Data control language (DCL) Use: Specifies operation permissions, data access discipline and administrative privileges, and defines the database users Example: SQL DCL, LDAP CM036 Advanced Databases Lecture 1: Relational Models

5 2 The Relational Algebra
Proposed by Codd in 1970 as a formal data model. Describes the operations to manipulate relations. Used to specify retrieval requests (queries). Query result is also in the form of a relation. Relational operators transform either a simple relation or a pair of relations into another relation Relational Operations: RESTRICT (s) and PROJECT () operations. Set operations: UNION (), INTERSECTION (), DIFFERENCE (—), CARTESIAN PRODUCT (). JOIN operations (⋈). Other relational operations: DIVISION, OUTER JOIN, AGGREGATE FUNCTIONS. CM036 Advanced Databases Lecture 1: Relational Models

6 CM036 Advanced Databases Lecture 1: Relational Models

7 2.1 RESTRICT s and PROJECT P
RESTRICT operation (called also SELECT- denoted bys ): Selects the tuples (rows) from a relation R that satisfy a certain selection condition c on the attributes of R : s c(R) Resulting relation includes each tuple in R whose attribute values satisfy the condition c, i.e. it has the same attributes as R Examples: s DNO=4(EMPLOYEE) s SALARY>30000(EMPLOYEE) s(DNO=4 AND SALARY>25000) OR DNO=5(EMPLOYEE) CM036 Advanced Databases Lecture 1: Relational Models

8 PROJECT operation (denoted by  ):
Keeps only certain attributes (columns) from a relation R specified in an attribute list L:  L(R) Resulting relation has only those attributes of R specified in L Example:  FNAME,LNAME,SALARY(EMPLOYEE) The PROJECT operation eliminates duplicate tuples in the resulting relation so that it remains a true set (no duplicate elements) Example:  SEX,SALARY(EMPLOYEE) If several male employees have salary 30000, only a single tuple <M, 30000> is kept in the resulting relation. CM036 Advanced Databases Lecture 1: Relational Models

9 CM036 Advanced Databases Lecture 1: Relational Models

10 Sequence of operations
Several operations can be combined to form a relational algebra expression (query) Example: Retrieve the names and salaries of employees who work in department 4  FNAME,LNAME,SALARY (s DNO=4(EMPLOYEE) ) Alternatively, we specify explicit intermediate relations for each step: DEPT4_EMPS  s DNO=4(EMPLOYEE) R   FNAME,LNAME,SALARY(DEPT4_EMPS) Attributes can optionally be renamed in the resulting left-hand-side relation (this may be required for some operations that will be presented later): R(FNAME,LNAME,SALARY)   FNAME,LNAME,SALARY(DEPT4_EMPS) CM036 Advanced Databases Lecture 1: Relational Models

11 CM036 Advanced Databases Lecture 1: Relational Models

12 2.2 Set Operations Binary operations from set theory: UNION: R1  R2,INTERSECTION: R1  R2, DIFFERENCE: R1 — R2, For , , —, the operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) must have the same number of attributes, and the domains of corresponding attributes must be compatible; that is, dom(Ai)=dom(Bi) for i=1, 2, ..., n. This condition is called union compatibility. The resulting relation for , , or — has the same attribute names as the first operand relation R1 (by convention). CM036 Advanced Databases Lecture 1: Relational Models

13 CM036 Advanced Databases Lecture 1: Relational Models

14 t [A1, A2, ..., Am] = t1 and t [B1, B2, ..., Bn] = t2
CARTESIAN PRODUCT R(A1, A2, ..., Am, B1, ..., Bn)  R1(A1, A2, ..., Am)  R2 (B1, ..., Bn) A tuple t exists in R for each combination of tuples t1 from R1 and t2 from R2 such that: t [A1, A2, ..., Am] = t1 and t [B1, B2, ..., Bn] = t2 The resulting relation R has n1 + n2 columns If R1 has n1 tuples and R2 has n2 tuples, then R will have n1*n2 tuples. CM036 Advanced Databases Lecture 1: Relational Models

15 FEMALE_EMPS  s SEX=‘F’(EMPLOYEE)
Obviously, CARTESIAN PRODUCT is useless if alone, since it generates all possible combinations. It can combine related tuples from two relations in a more informative way if followed by the appropriate RESTRICT operation Example: Retrieve a list of the names of dependents for each female employee FEMALE_EMPS  s SEX=‘F’(EMPLOYEE) EMPNAMES   FNAME,LNAME,SSN(FEMALE_EMPS) EMP_DEPENDENTS  EMPNAMES  DEPENDENT ACTUAL_DEPENDENTS  s SSN=ESSN(EMP_DEPENDENTS) RESULT   FNAME,LNAME,DEPENDENT_NAME(ACTUAL_DEPENDENTS) CM036 Advanced Databases Lecture 1: Relational Models

16 CM036 Advanced Databases Lecture 1: Relational Models

17 2.3 JOIN Operations THETA JOIN: Similar to a CARTESIAN PRODUCT followed by a RESTRICT. The condition c is called a join condition. R(A1, A2, ..., Am, B1, B2, ..., Bn)  R1(A1, A2, ..., Am) ⋈ c R2 (B1, B2, ..., Bn) EQUIJOIN: The join condition c includes equality comparisons involving attributes from R1 and R2. That is, c is of the form: (Ai=Bj) AND ... AND (Ah=Bk); 1<i,h<m, 1<j,k<n In the above EQUIJOIN operation: Ai, ..., Ah are called the join attributes of R1 Bj, ..., Bk are called the join attributes of R2 Example: Retrieve each department's name and its manager's name: T  DEPARTMENT ⋈MGRSSN=SSN EMPLOYEE RESULT   DNAME,FNAME,LNAME(T) CM036 Advanced Databases Lecture 1: Relational Models

18 DEPT_MGR   DNAME,…,MGRSSN,…LNAME,…,SSN… (DEPARTMENT ⋈ MGRSSN=SSN EMPLOYEE)
CM036 Advanced Databases Lecture 1: Relational Models

19 NATURAL JOIN In an EQUIJOIN R  R1 ⋈c R2, the join attribute of R2 appear redundantly in the result relation R. In a NATURAL JOIN, the join attributes of R2 are eliminated from R. The equality is implied and there is no need to specify it. The form of the operator is R  R1 ⋈(join attributes of R1),(join attributes of R2) R2 Example: Retrieve each employee's name and the name of the department he/she works for: T  EMPLOYEE ⋈(DNO),(DNUMBER) DEPARTMENT RESULT   FNAME,LNAME,DNAME(T) If the join attributes have the same names in both relations, they need not be specified and we can write R  R1 * R2. Example: Retrieve each employee's name and the name of his/her supervisor: SUPERVISOR(SUPERSSN,SFN,SLN) SN,FNAME,LNAME(EMPLOYEE) T  EMPLOYEE * SUPERVISOR RESULT   FNAME,LNAME,SFN,SLN(T) CM036 Advanced Databases Lecture 1: Relational Models

20 CM036 Advanced Databases Lecture 1: Relational Models

21 JOIN ATTRIBUTES RELATIONSHIP EMPLOYEE.SSN= EMPLOYEE manages
Note: In the original definition of NATURAL JOIN, the join attributes were required to have the same names in both relations. There can be a more than one set of join attributes with a different meaning between the same two relations. For example: JOIN ATTRIBUTES RELATIONSHIP EMPLOYEE.SSN= EMPLOYEE manages DEPARTMENT.MGRSSN the DEPARTMENT EMPLOYEE.DNO= EMPLOYEE works for DEPARTMENT.DNUMBER the DEPARTMENT Example: Retrieve each employee’s name and the name of the department he/she works for: T  EMPLOYEE ⋈ DNO=DNUMBERDEPARTMENT RESULT   FNAME,LNAME,DNAME(T) CM036 Advanced Databases Lecture 1: Relational Models

22 A relation can have a set of join attributes to join it with itself :
JOIN ATTRIBUTES RELATIONSHIP EMPLOYEE(1).SUPERSSN= EMPLOYEE(2) supervises EMPLOYEE(2).SSN EMPLOYEE(1) One can think of this as joining two distinct copies of the relation, although only one relation actually exists In this case, renaming can be useful Example: Retrieve each employee’s name and the name of his/her supervisor: SUPERVISOR(SSSN,SFN,SLN)   SSN,FNAME,LNAME(EMPLOYEE) T  EMPLOYEE ⋈SUPERSSN = SSSNSUPERVISOR RESULT   FNAME,LNAME,SFN,SLN(T) CM036 Advanced Databases Lecture 1: Relational Models

23 2.4 Complete Set of Operations
All the operations discussed so far can be described as a sequence of only the operations RESTRICT, PROJECT, UNION, SET DIFFERENCE, and CARTESIAN PRODUCT. Hence, the set {s , , ,—, } is called a complete set of relational algebra operations. Any query language equivalent to these operations is called relationally complete. For database applications, additional operations are needed that were not part of the original relational algebra. These include: 1. Aggregate functions and grouping. 2. OUTER JOIN and OUTER UNION. CM036 Advanced Databases Lecture 1: Relational Models

24 2.4 Additional Relational Operations
AGGREGATE FUNCTIONS Functions such as SUM, COUNT, AVERAGE, MIN, MAX are often applied to sets of values or sets of tuples in database applications <grouping attributes> <function list> (R) The grouping attributes are optional Example: Retrieve the average salary of all employees (no grouping needed): R(AVGSAL)   AVERAGE SALARY (EMPLOYEE) Example: For each department, retrieve the department number, the number of employees, and the average salary: R(DNO,NUMEMPS,AVGSAL)  DNO  COUNT SSN, AVERAGE SALARY (EMPLOYEE) DNO is called the grouping attribute in the above example CM036 Advanced Databases Lecture 1: Relational Models

25 CM036 Advanced Databases Lecture 1: Relational Models

26 OUTER JOIN In a regular EQUIJOIN or NATURAL JOIN operation, tuples in R1 or R2 that do not have matching tuples in the other relation do not appear in the result. Some queries require all tuples in R1 (or R2 or both) to appear in the result When no matching tuples are found, nulls are placed for the missing attributes LEFT OUTER JOIN: R R2 lets every tuple in R1 appear in the result RIGHT OUTER JOIN: R R2 lets every tuple in R2 appear in the result FULL OUTER JOIN: R R2 lets every tuple in both R1 and R2 appear in the result CM036 Advanced Databases Lecture 1: Relational Models

27 TEMP EMPLOYEE SSN=MGRSSNDEPARTMENT
RESULT   FNAME,MINIT,LNAME,DNAME(TEMP) CM036 Advanced Databases Lecture 1: Relational Models

28 3 Relational Calculus Declarative language based on predicate logic - checks what is true in the database rather then looking to get it The main difference in comparison with the relational algebra is the introduction of variables which range over attributes or tuples Two different variations of the relational calculus: domain calculus - specification of formal properties for different data types, used for describing data tuple calculus - querying and checking formal properties of stored relational data CM036 Advanced Databases Lecture 1: Relational Models

29 3.1 Predicate Calculus Vocabulary of constant names, functional attributes and predicate properties, used for describing the information Phrases for specification of constant, functionally dependant or predicated properties build using proper vocabulary terms Statements about the world, formulated as meaningful phrases, connected through logical connectives in logical sentences conjunction () disjunction () negation () implication () equivalence () Possible quantification of the sentences using existential () or universal () quantifiers, ranging over variables Example: All units registered in the database have unit leaders among the lecturers (x).(Unit (x)  (y).(Lecturer(y)  Leader(y,x))) CM036 Advanced Databases Lecture 1: Relational Models

30 3.2 Relational DB as a Predicate Calculus model
Semantic interpretation of the calculus is given in a set, so either all type domains of the relational schema (domain calculus), or the set of all relations in the database (tuple calculus) can serve a model for it All meaningful sentences in the model are true or false, so the relation tuples can be interpreted as truthful facts describing the world The constraints describe logical regularities among the attribute values, so they can be expressed as logical sentences Example: All records of unit leaders have primary keys Relation names are predicates with a place for each attribute Leader(Lno:INTEGER,Lname:CHAR,Uname:CHAR) - Lno, Lname and Uname are attributes of Leader Data values in the relation tuples are constants from the domains Leader(2,‘Johnson’,’CS234’) - 2,Johnson and CS234 are constants forming one Leader tuple Constraints can be stated using quantified variables over domains (x)(y,z) .(Leader(x,y,z)) - x stands for the primary key of Leader CM036 Advanced Databases Lecture 1: Relational Models

31 1. In predicate calculus sentences contain quantified variables only
Notes: 1. In predicate calculus sentences contain quantified variables only 2. The sub-expressions following the quantified variables form the scope of quantification; it is usually closed in round parentheses Example: The table for unit leaders has foreign keys to both the lecturers and units tables Relation names are predicates with place for each attribute Unit(Uno:INTEGER, Uname:CHAR) - Uno and Uname are attributes of relation Unit Lecturer(Lno:INTEGER, Lname:CHAR) - Lno and Lname are attributes of relation Lecturer Leader(LUno:INTEGER,Lno:INTEGER,Uno:INTEGER) - LUno, Lno and Uno are attributes of relation Leader All foreign key constraints can be stated logically using quantified sentences, in which variables range over the respective values ( LUno,Uno,Lno).(Leader(LUno,Uno,Lno)  ( Lname).(Lecturer(Lno,Lname))  ( Uname).(Unit(Uno,Uname)) ) CM036 Advanced Databases Lecture 1: Relational Models

32 3.3 Relational Calculus as database query language
Queries are logical expressions, which contain non-quantified (free) variables for tuples The free variables are placeholders for the information, we are looking for from the database relations The logical expressions, used in the query, are its conditions which need to be met during answering the query Example: Who are the employees with salary greater than ? {e.FNAME,e.LNAME | Employee(e)  e.SALARY > } An answer is a replacement of the free variables in the query with tuple attributes from the relations used to formulate the query conditions; they should make the statement of the query true in database (pattern matching) Answer: James Borg and Jennifer Wallace Free variables Matching values e.FNAME James Jennifer e.LNAME Borg Wallace CM036 Advanced Databases Lecture 1: Relational Models

33 1. The question variables are always listed in front of the
Notes: 1. The question variables are always listed in front of the expression, separated by | from the question condition; 2. Question condition can contain free variables only from the list in the beginning; all other variables should be quantified Querying related tables requires check of the corresponding keys for equality Example: Retrieve the name and address of all employees who work for the ‘Research’ department {e.FNAME,e.LNAME,e.ADDRESS | Employee(e)  ( d).(Department(d)  d.DNAME = ‘Research’  d.DNUMBER = e.DNO ) } CM036 Advanced Databases Lecture 1: Relational Models

34 3.4 Equvalence between relational algebra and relational calculus
Relational algebra and relational calculus are equivalent with respect to their ability to formulate queries against relational databases (Codd 1972) Relational algebra concentrates on the procedural (“how-to”) aspects and because of this it is used as an intermediate language for optimization of database queries Relational calculus is more appropriate to specify declaratively the model properties (“what is true”), without worrying about the way it is achieved and as such it can be used as a specification or querying language The relational language Query-By-Example (QBE), which is developed by IBM during 70ties and is implemented in Paradox and Access desktop database systems is based on the relational calculus CM036 Advanced Databases Lecture 1: Relational Models

35 Summary The variety of relational languages currently used to communicate with databases has different usage in database systems lifecycle Formal languages have unambiguous syntax and clear semantics, which makes them good for specifications and conceptualisation Formal languages hide the details of implementation and can’t give very deep insight into the real database processes Formal languages formalize adequately only part of the database operations, which have semantics inherited from the relational model; they do not cover DCL at al. SQL is not a programming language, but language for communication with DBMS SQL as a mixture of DDL, DML and DCL is the ultimate choice for practical relational databases development and administration SQL syntax is ambiguous when it comes to the order of operations, which requires deep knowledge about the way it is interpreted CM036 Advanced Databases Lecture 1: Relational Models


Download ppt "Lecture 1: Relational Data Models"

Similar presentations


Ads by Google