Relational Algebra and SQL Prof. Sin-Min Lee Department of Computer Science.

Slides:



Advertisements
Similar presentations
Relational Algebra and Relational Calculus
Advertisements

CSCI 4333 Database Design and Implementation Review for Midterm Exam I
IS698: Database Management Min Song IS NJIT. The Relational Data Model.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Relational Algebra Tim Kaddoura CS157A. Introduction  Relational query languages are languages for describing queries on a relational database  Three.
Relational Algebra Ch. 7.4 – 7.6 John Ortiz. Lecture 4Relational Algebra2 Relational Query Languages  Query languages: allow manipulation and retrieval.
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
CSCI 4333 Database Design and Implementation Review for Midterm Exam II Xiang Lian The University of Texas – Pan American Edinburg, TX 78539
CREATE VIEW SYNTAX CREATE VIEW name [(view_col [, view_col …])] AS [WITH CHECK OPTION];
1 Relational Algebra and SQL Chapter 6. 2 Relational Query Languages Languages for describing queries on a relational database Structured Query LanguageStructured.
Oct 28, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
Revision for Midterm 3 Part 3 Prof. Sin-Min Lee Department of Computer Science.
Database Systems Chapter 6 ITM Relational Algebra The basic set of operations for the relational model is the relational algebra. –enable the specification.
Revision for Midterm 3 revision 3 Prof. Sin-Min Lee Department of Computer Science.
Lesson II The Relational Model © Pearson Education Limited 1995, 2005.
1 Minggu 3, Pertemuan 5 Relational Algebra (Cont.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
1 Chapter 5 Relational Algebra and SQL. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
1-1 Thomas Connolly and Carolyn Begg’s Database Systems: A Practical Approach to Design, Implementation, and Management Chapter 4 Part One: Relational.
Relational Algebra & Relational Calculus Dale-Marie Wilson, Ph.D.
RELATIONAL ALGEBRA Objectives
Relational Model & Relational Algebra. 2 Relational Model u Terminology of relational model. u How tables are used to represent data. u Connection between.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 3: Introduction.
Chapter 3 Section 3.4 Relational Database Operators
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
1 CS 430 Database Theory Winter 2005 Lecture 12: SQL DML - SELECT.
1 CSE 480: Database Systems Lecture 11: SQL. 2 SQL Query SELECT FROM WHERE –In MySQL, FROM and WHERE clauses are optional –Example:
Relational Algebra - Chapter (7th ed )
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 6 The Relational Algebra.
Relational Algebra Chapter 4 CIS 458 Sungchul Hong.
Lecture 15: Relational Algebra
Revision for Final Exam Prof. Sin-Min Lee Department of Computer Science.
Oracle DML Dr. Bernard Chen Ph.D. University of Central Arkansas.
SQL Data Manipulation II Chapter 5 CIS 458 Sungchul Hong.
Bayu Adhi Tama, ST., MTI. Introduction Relational algebra and relational calculus are formal languages associated with the relational.
Advanced Database Systems
1 Chapter 5 Relational Algebra and SQL. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
1 Pertemuan > > Matakuliah: >/ > Tahun: > Versi: >
From Relational Algebra to SQL CS 157B Enrique Tang.
1 Chapter 5 Relational Algebra and SQL. 2 Relational Query Languages Languages for describing queries on a relational database Structured Query LanguageStructured.
IS 230Lecture 6Slide 1 Lecture 7 Advanced SQL Introduction to Database Systems IS 230 This is the instructor’s notes and student has to read the textbook.
Chapter 5 Relational Algebra and Relational Calculus Pearson Education © 2009.
1 CSE 480: Database Systems Lecture 12: SQL (Nested queries and Aggregate functions)
Chapter 5 Relational Algebra Pearson Education © 2014.
Advanced Relational Algebra & SQL (Part1 )
CSC271 Database Systems Lecture # 8. Summary: Previous Lecture  Relation algebra and operations  Selection (Restriction), projection  Union, set difference,
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
Chapter 5 Relational Algebra and SQL. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 5-2 Relational Query Languages Languages for describing.
1 Chapter 5 Relational Algebra and SQL. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Query Processing – Implementing Set Operations and Joins Chap. 19.
1 CSE 480: Database Systems Lecture 16: Relational Algebra.
1 Schema for Student Registration System Student Student (Id, Name, Addr, Status) Professor Professor (Id, Name, DeptId) Course Course (DeptId, CrsCode,
Chapter 5 Relational Algebra and Relational Calculus Pearson Education © 2014.
LECTURE THREE RELATIONAL ALGEBRA 11. Objectives  Meaning of the term relational completeness.  How to form queries in relational algebra. 22Relational.
1 Relational Algebra and SQL. 2 Relational Query Languages Languages for describing queries on a relational database Relational AlgebraRelational Algebra.
Relational Algebra COMP3211 Advanced Databases Nicholas Gibbins
Relational Calculus. Relational calculus query specifies what is to be retrieved rather than how to retrieve it. – No description of how to evaluate a.
CSCI 6315 Applied Database Systems Review for Midterm Exam II Xiang Lian The University of Texas Rio Grande Valley Edinburg, TX 78539
Ritu CHaturvedi Some figures are adapted from T. COnnolly
Database Systems Chapter 6
Chapter (6) The Relational Algebra and Relational Calculus Objectives
More SQL: Complex Queries,
COMP3017 Advanced Databases
Relational Algebra and SQL
Relational Algebra and Relational Calculus
SQL: Structured Query Language DML- Queries Lecturer: Dr Pavle Mogin
DATABASE SYSTEM.
The Relational Algebra
Presentation transcript:

Relational Algebra and SQL Prof. Sin-Min Lee Department of Computer Science

The Role of Relational Algebra in a DBMS

Relational Algebra Operations

More Relational Algebra

Select Extracts specified tuples (rows) from a specified relation (table).

Select Operator Produce table containing subset of rows of argument table satisfying condition  condition (relation) Example: Person Person Person  Hobby=‘stamps’ ( Person ) 1123 John 123 Main stamps 1123 John 123 Main coins 5556 Mary 7 Lake Dr hiking 9876 Bart 5 Pine St stamps 1123 John 123 Main stamps 9876 Bart 5 Pine St stamps Id Name Address Hobby

Selection Condition Operators:, =,  Simple selection condition: – operator AND OR NOT

Selection Condition - Examples Person  Id>3000 OR Hobby=‘hiking’ (Person) Person  Id>3000 AND Id <3999 (Person) Person  NOT (Hobby=‘hiking’) (Person) Person  Hobby  ‘hiking’ (Person)

Project Extracts specified attributes(columns) from a specified relation.

Project Operator Produces table containing subset of columns of argument table  attribute list (relation) Example: PersonPerson Person  Name,Hobby (Person) 1123 John 123 Main stamps 1123 John 123 Main coins 5556 Mary 7 Lake Dr hiking 9876 Bart 5 Pine St stamps John stamps John coins Mary hiking Bart stamps Id Name Address Hobby Name Hobby

Project Operator 1123 John 123 Main stamps 1123 John 123 Main coins 5556 Mary 7 Lake Dr hiking 9876 Bart 5 Pine St stamps John 123 Main Mary 7 Lake Dr Bart 5 Pine St Result is a table (no duplicates); can have fewer tuples than the original Id Name Address Hobby Name Address Example: PersonPerson Person  Name,Address (Person)

Expressions 1123 John 123 Main stamps 1123 John 123 Main coins 5556 Mary 7 Lake Dr hiking 9876 Bart 5 Pine St stamps 1123 John 9876 Bart Id Name Address Hobby Id Name Person Result Person  Id, Name (  Hobby=’stamps’ OR Hobby=’coins’ (Person) )

Set Operators Relation is a set of tuples, so set operations should apply: , ,  (set difference) Result of combining two relations with a set operator is a relation => all its elements must be tuples having same structure union compatible relationsHence, scope of set operations limited to union compatible relations

Union Compatible Relations union compatibleTwo relations are union compatible if –Both have same number of columns –Names of attributes are the same in both –Attributes with the same name in both relations have the same domain unionintersectionset differenceUnion compatible relations can be combined using union, intersection, and set difference

Example Tables: Person Person (SSN, Name, Address, Hobby) Professor Professor (Id, Name, Office, Phone) are not union compatible. But PersonProfessor  Name (Person) and  Name (Professor) are union compatible so PersonProfessor  Name (Person) -  Name (Professor) makes sense.

Cartesian Product RSRS R SIf R and S are two relations, R  S is the set of all concatenated tuples, where x is a tuple in R and y is a tuple in S –RS –R and S need not be union compatible RSR  S is expensive to compute: –Factor of two in the size of each row –Quadratic in the number of rows A B C D A B C D x1 x2 y1 y2 x1 x2 y1 y2 x3 x4 y3 y4 x1 x2 y3 y4 x3 x4 y1 y2 RS R S x3 x4 y3 y4 RS R  S

Renaming Result of expression evaluation is a relation Attributes of relation must have distinct names. This is not guaranteed with Cartesian product –e.g., suppose in previous example a and c have the same name Renaming operator tidies this up. To assign the names A 1, A 2,… A n to the attributes of the n column relation produced by expression expr use expr [A 1, A 2, … A n ]

Example This is a relation with 4 attributes: StudId, CrsCode1, ProfId, CrsCode2 Transcript Transcript (StudId, CrsCode, Semester, Grade) Teaching Teaching (ProfId, CrsCode, Semester) Transcript  StudId, CrsCode (Transcript)[StudId, CrsCode1] Teaching   ProfId, CrsCode (Teaching) [ProfId, CrsCode2]

Join Builds a relation from two specified relations consisting of all possible concatenated pairs, one from each of the two relations, such that in each pair the two tuples satisfy some condition. (E.g., equal values in a given col.) A1 B1 A2 B1 A3 B2 B1 C1 B2 C2 B3 C3 A1 B1 C1 A2 B1 C1 A3 B2 C2 (Natural or Inner) Join

Outer Join Outer Joins are similar to PRODUCT -- but will leave NULLs for any row in the first table with no corresponding rows in the second. A1 B1 A2 B1 A3 B2 A4 B7 B1 C1 B2 C2 B3 C3 A1 B1 C1 A2 B1 C1 A3 B2 C2 A4 * * Outer Join

Join Items

Derived Operation: Join generalthetajoin A (general or theta) join of R and S is the expression R join-condition S where join-condition is a conjunction of terms: A i oper B i in which A i is an attribute of R; B i is an attribute of S; and oper is one of =,,  , . The meaning is:  join-condition ´ (R  S) where join-condition and join-condition ´ are the same, except for possible renamings of attributes (next)

Join Join is a derivative of Cartesian product. Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations. One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems. Various forms of join operation –Natural join (defined by Codd) –Outer join –Theta join –Equijoin (a particular type of Theta join) –Semijoin

Natural Join List the names and comments of all clients who have viewed a property for rent. (  clientNo, fName, lName (Client)) Join (  clientNo, propertyNo, comment (Viewing)) Or Client [ clientNo, fName, lName ] Join Viewing [ clientNo, propertyNo, comment ]

Outer Join To display rows in the result that do not have matching values in the join column, use Outer join. R Left Outer Join S –(Left) outer join is join in which tuples from R that do not have matching values in common columns of S are also included in result relation. Example: Produce a status report on property viewings.  propertyNo, street, city (PropertyForRent) Left Outer Join Viewing

Join and Renaming Problem: R and S might have attributes with the same name – in which case the Cartesian product is not defined Solutions: 1.Rename attributes prior to forming the product and use new names in join-condition´. Transcript.CrsCodeTeaching.CrsCode 2.Qualify common attribute names with relation names (thereby disambiguating the names). For instance: Transcript.CrsCode or Teaching.CrsCode –This solution is nice, but doesn’t always work: consider RR R join_condition R R In R.A, how do we know which R is meant?

Theta Join – Example Employee(Name,Id,MngrId,Salary Employee(Name,Id,MngrId,Salary) Manager(Name,Id,Salary Manager(Name,Id,Salary) Output the names of all employees that earn more than their managers. Employee EmployeeManager  Employee.Name ( Employee MngrId=Id AND Salary>Salary Manager ) The join yields a table with attributes: EmployeeEmployeeEmployee Employee.Name, Employee.Id, Employee.Salary, MngrId ManagerManagerManager Manager.Name, Manager.Id, Manager.Salary

Equijoin Join - Example Student  Name,CrsCode ( Student Transcript Id=StudId  Grade=‘A’ (Transcript)) Id Name Addr Status 111 John ….. … Mary ….. … Bill ….. … Joe ….. ….. StudId CrsCode Sem Grade 111 CSE305 S00 B 222 CSE306 S99 A 333 CSE304 F99 A Mary CSE306 Bill CSE304 The equijoin is used very frequently since it combines related data in different relations. Student Transcript Equijoin Equijoin: Join condition is a conjunction of equalities.

Natural Join Special case of equijoin: –join condition equates all and only those attributes with the same name (condition doesn’t have to be explicitly stated) –duplicate columns eliminated from the result Transcript Transcript (StudId, CrsCode, Sem, Grade) Teaching ( Teaching (ProfId, CrsCode, Sem) Transcript Teaching Teaching =  StudId, Transcript.CrsCode, Transcript.Sem, Grade, ProfId Transcript ( Transcript Sem Teaching CrsCode=CrsCode AND Sem=Sem Teaching ) [ StudId, CrsCode, Sem, Grade, ProfId ]

Natural Join (cont’d) More generally: R SRS S =  attr-list (  join-cond (R × S) ) where RS attr-list = attributes (R)  attributes (S) (duplicates are eliminated) and join-cond has the form: A 1 = A 1 AND … AND A n = A n where RS {A 1 … A n } = attributes(R)  attributes(S)

Natural Join Example List all Ids of students who took at least two different courses:  StudId (  CrsCode  CrsCode2 ( Transcript Transcript Transcript [ StudId, CrsCode2, Sem2, Grade2 ] )) We don’t want to join on CrsCode, Sem, and Grade attributes, hence renaming!

Aggregates Functions that operate on sets: –COUNT, SUM, AVG, MAX, MIN Produce numbers (not tables) Not part of relational algebra SELECT COUNT(*) Professor FROM Professor P SELECT MAX (Salary) Employee FROM Employee E

Aggregates SELECT COUNT ( T.CrsCode ) Teaching FROM Teaching T WHERE T.Semester = ‘S2000’ SELECT COUNT (DISTINCT T.CrsCode ) Teaching FROM Teaching T WHERE T.Semester = ‘S2000’ Count the number of courses taught in S2000 But if multiple sections of same course are taught, use:

Aggregates: Proper and Improper Usage SELECT COUNT (T.CrsCode), T. ProfId – makes no sense (in the absence of GROUP BY clause) SELECT COUNT (*), AVG (T.Grade) – but this is OK WHERE T.Grade > COUNT (SELECT ….) – aggregate cannot be applied to result of SELECT statement

Grouping But how do we compute the number of courses taught in S2000 per professor? –Strategy 1: Fire off a separate query for each professor: SELECT COUNT( T.CrsCode ) Teaching FROM Teaching T WHERE T.Semester = ‘S2000’ AND T.ProfId = Cumbersome What if the number of professors changes? Add another query? grouping operator –Strategy 2: define a special grouping operator: SELECT T.ProfId, COUNT( T.CrsCode ) Teaching FROM Teaching T WHERE T.Semester = ‘S2000’ GROUP BY T.ProfId

GROUP BY

GROUP BY - Example SELECT T.StudId, AVG( T.Grade ), COUNT (*) Transcript FROM Transcript T GROUP BY T.StudId Transcript Attributes: -student’s Id -avg grade -number of courses

HAVING Clause Eliminates unwanted groups (analogous to WHERE clause ) HAVING condition constructed from attributes of GROUP BY list and aggregates of attributes not in list SELECT T.StudId, AVG (T.Grade) AS CumGpa, COUNT (*) AS NumCrs Transcript FROM Transcript T WHERE T.CrsCode LIKE ‘CS%’ GROUP BY T.StudId HAVING AVG (T.Grade) > 3.5

Evaluation of GroupBy with Having

Example Output the name and address of all seniors on the Dean’s List SELECT S.Id, S.Name StudentTranscript FROM Student S, Transcript T WHERE S.Id = T.StudId AND S.Status = ‘senior’ GROUP BY HAVING AVG (T.Grade) > 3.5 AND SUM (T.Credit) > 90 S.Id -- wrong S.Id, S.Name -- right Every attribute that occurs in SELECT clause must also occur in GROUP BY or it must be an aggregate. S.Name does not.

ORDER BY Clause Causes rows to be output in a specified order SELECT T.StudId, COUNT (*) AS NumCrs, AVG (T.Grade) AS CumGpa Transcript FROM Transcript T WHERE T.CrsCode LIKE ‘CS%’ GROUP BY T.StudId HAVING AVG (T.Grade) > 3.5 ORDER BY DESC CumGpa, ASC StudId

Query Evaluation Strategy 1Evaluate FROM : produces Cartesian product, A, of tables in FROM list 2Evaluate WHERE : produces table, B, consisting of rows of A that satisfy WHERE condition 3Evaluate GROUP BY : partitions B into groups that agree on attribute values in GROUP BY list 4Evaluate HAVING : eliminates groups in B that do not satisfy HAVING condition 5Evaluate SELECT : produces table C containing a row for each group. Attributes in SELECT list limited to those in GROUP BY list and aggregates over group 6Evaluate ORDER BY : orders rows of C

Nested Queries List all courses that were not taught in S2000 SELECT C.CrsName FROM Course C WHERE C.CrsCode NOT IN (SELECT T.CrsCode --subquery FROM Teaching T WHERE T.Sem = ‘S2000’) Evaluation strategy: subquery evaluated once to produces set of courses taught in S2000. Each row (as C) tested against this set.

Correlated Nested Queries Output a row if prof has taught a course in dept. SELECT T.ProfId --subquery FROM Teaching T, Course C WHERE T.CrsCode=C.CrsCode AND C.DeptId=D.DeptId --correlation SELECT P.Name, D.Name --outer query FROM Professor P, Department D WHERE P.Id IN (set of Id’s of all profs who have taught a course in D.DeptId)

Correlated Nested Queries (con’t) Tuple variables T and C are local to subquery Tuple variables P and D are global to subquery Correlation: subquery uses a global variable, D The value of D.DeptId parameterizes an evaluation of the subquery Subquery must (at least) be re-evaluated for each distinct value of D.DeptId Correlated queries can be expensive to evaluate

Division Goal: Produce the tuples in one relation, r, that match all tuples in another relation, s –r –r (A 1, …A n, B 1, …B m ) –s –s (B 1 …B m ) –rs s r –r/s, with attributes A 1, …A n, is the set of all tuples such that for every tuple in s, is in r Can be expressed in terms of projection, set difference, and cross-product

Division (con’t)

Division Identify all clients who have viewed all properties with three rooms. (  clientNo, propertyNo (Viewing))  (  propertyNo (  rooms = 3 (PropertyForRent)))

Division - Example List the Ids of students who have passed all courses that were taught in spring 2000 Numerator: –StudId and CrsCode for every course passed by every student: Transcript  StudId, CrsCode (  Grade  ‘F’ (Transcript) ) Denominator: – CrsCode of all courses taught in spring 2000 Teaching  CrsCode (  Semester=‘S2000’ (Teaching) ) Result is numerator/denominator

Division Query type: Find the subset of items in one set that are related to all items in another set Example: Find professors who have taught courses in all departments –Why does this involve division? ProfId DeptIdDeptId All department Ids Contains row if professor p has taught a course in department d  ProfId,DeptId (Professor) /  DeptId (Department)

Division Strategy for implementing division in SQL: –Find set, A, of all departments in which a particular professor, p, has taught a course –Find set, B, of all departments –Output p if A  B, or, equivalently, if B–A is empty

Division – SQL Solution SELECT P.Id Professor FROM Professor P WHERE NOT EXISTS (SELECT D.DeptId -- set B of all dept Ids Department FROM Department D EXCEPT SELECT C.DeptId -- set A of dept Ids of depts in -- which P has taught a course TeachingCourse FROM Teaching T, Course C WHERE T.ProfId=P.Id -- global variable AND T.CrsCode=C.CrsCode)

Division Query type: Find the subset of items in one set that are related to all items in another set Example: Find professors who have taught courses in all departments –Why does this involve division? ProfId DeptIdDeptId All department Ids Contains row if professor p has taught a course in department d

Division Strategy for implementing division in SQL: –Find set of all departments in which a particular professor, p, has taught a course - A –Find set of all departments - B –Output p if A  B, or equivalently if B-A is empty

Division – SQL Solution SELECT P.Id FROM Professor P WHERE NOT EXISTS (SELECT D.DeptId -- B: set of all dept Ids FROM Department D EXCEPT SELECT C.DeptId -- A: set of dept Ids of depts in -- which P has taught a course FROM Teaching T, Course C WHERE T.ProfId=P.Id --global variable AND T.CrsCode=C.CrsCode)

GROUP BY Table output by WHERE clause: - Divide rows into groups based on subset of attributes; - All members of a group agree on those attributes Each group can be described by a single row in a table with attributes limited to: -Attributes all group members share (listed in GROUP BY clause) -Aggregates over group group GROUP BY attributes aabbcccccdddaabbcccccddd

GROUP BY - Example SELECT T.StudId, AVG(T.Grade), COUNT (*) FROM Transcript T GROUP BY T.StudId Transcript Attributes: -student’s Id -avg grade -number of courses

HAVING Clause Eliminates unwanted groups (analogous to WHERE clause ) HAVING condition constructed from attributes of GROUP BY list and aggregates of attributes not in list SELECT T.StudId, AVG( T.Grade ) AS CumGpa, COUNT (*) AS NumCrs FROM Transcript T WHERE T.CrsCode LIKE ‘CS%’ GROUP BY T.StudId HAVING AVG (T.Grade) > 3.5

Example Output the name and address of all seniors on the Dean’s List SELECT S.Name, S.Address FROM Student S, Transcript T WHERE S.StudId = T.StudId AND S.Status = ‘senior’ GROUP BY HAVING AVG (T.Grade) > 3.5 AND SUM (T.Credit) > 90 S.StudId -- wrong S.Name, S.Address -- right

SQL and Relational Algebra RELATIONAL ALGEBRASQL PSEUDO-CODE 1) R = customers where city = 'Dallas' SELECT * from customers where city = 'Dallas'; For c1 = first row of customers until c1 = last row of customers If c1.city = ‘Dallas’ Display c1.* End-If End-For 2) R = customers [cid, cname] SELECT cid, cname from customers; For c1 = first row of customers until c1 = last row of customers Display c1.cid, c1.cname End-For

Constructing SQL

nele_Algebra_SQL2.pdf