Presentation is loading. Please wait.

Presentation is loading. Please wait.

Relational Model & Algebra Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 13, 2005 Some slide content courtesy.

Similar presentations


Presentation on theme: "Relational Model & Algebra Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 13, 2005 Some slide content courtesy."— Presentation transcript:

1 Relational Model & Algebra Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 13, 2005 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

2 2 Administrivia  Homework assignments will normally be given out on Thursdays, due the following Thursday unless otherwise directed  Start thinking about which project you want to do, who you might work with  Will need to form groups and pick a project by the end of next week (I’ll announce more)  We will soon be holding an extra session on the project, as well as essential tools and skills (stay tuned)

3 3 Thinking Back to Last Time…  There are a variety of ways of representing data, each with trade-offs  Free text  Classes and subclasses  Shapes/points in space  “Objects” with “properties”  In general, our emphasis will be on the last item  … though there are spatial databases, OO databases, text databases, and the like…

4 4 The Relational Data Model (1970) Lessons from the Codd paper  Let’s separate physical implementation from logical  Model the data independently from how it will be used (accessed, printed, etc.)  Describe the data minimally and mathematically  A relation describes an association between data items – tuples with attributes  We generally think of tables and rows, but that’s somewhat imprecise  Use standard mathematical (logical) operations over the data – these are the relational algebra or relational calculus How does this model relate to objects, properties? What are its abilities and limitations?

5 5 Why Did It Take So Many Years to Implement Relational Databases?  Codd’s original work: 1969-70  Earliest relational database research: ~1976  Oracle “2.0”: 1979  Why the gap? 1.“You could do the same thing in other ways” 2.“Nobody wants to write math formulas” 3.“Why would I turn my data into tables?” 4.“It won’t perform well”  What do you think?

6 6 Getting More Concrete: Building a Database and Application 1.Start with a conceptual model  “On paper” using certain techniques we’ll discuss next week  We ignore low-level details – focus on logical representation 2.Design & implement schema  Design and codify (in SQL) the relations/tables  Do physical layout – indexes, etc. 3.Import the data 4.Write applications using DBMS and other tools Many of the hard problems are taken care of by other people (DBMS, API writers, library authors, web server, etc.)

7 7 Conceptual Design for CIS Student Course Survey STUDENT COURSE Takes name sid cid name PROFESSOR Teaches semester fid name exp-grade “Who’s taking what, and what grade do they expect?” This design is independent of the final form of the report!

8 8 Example Schema  Our focus now: relational schema – set of tables  Can have other kinds of schemas – XML, object, … sidname 1Jill 2Qun 3Nitin fidname 1Ives 2Saul 8Martin sidexp-gradecid 1A550-0105 1A700-1005 3C500-0105 cidsubjsem 550-0105DBF05 700-1005AIS05 501-0105ArchF05 fidcid 1550-0105 2700-1005 8501-0105 STUDENT Takes COURSE PROFESSOR Teaches

9 9 Some Terminology  Columns of a relation are called attributes or fields  The number of these columns is the arity of the relation  The rows of a relation are called tuples  Each attribute has values taken from a domain, e.g., subj has domain string Theoretically: a relation is a set of tuples; no tuple can occur more than once  Real systems may allow duplicates for efficiency or other reasons – we’ll ignore this for now  Objects and XML may also have the same content with different “identity”

10 10 Describing Relations  A schema can be represented many ways  To the DBMS, use data definition language (DDL) – like programming language type definitions  In relational DBs, we use relation(attribute:domain) STUDENT(sid:int, name:string) Takes(sid:int, exp-grade:char[2], cid:string) COURSE(cid:string, subj:string, sem:char[3]) Teaches(fid:int, cid:string) PROFESSOR(fid:int, name:string)

11 11 More on Attribute Domains  Relational DBMSs have very limited “built-in” domains: either tables or scalar attributes – int, string, byte sequence, date, etc.  But more generally:  We can have “nested relations”  Object-oriented, object-relational systems allow complex, user- defined domains – lists, classes, etc.  XML systems allow for XML trees (or lists of trees) that follow certain structural constraints Database people, when they are discussing design, often assume domains are evident to the reader: STUDENT(sid, name)

12 12 Integrity Constraints  Domains and schemas are one form of constraint on a valid data instance  Other important constraints include: Key constraints:  Subset of fields that uniquely identifies a tuple, and for which no subset of the key has this property  May have several candidate keys; one is chosen as the primary key  A superkey is a subset of fields that includes a key Inclusion dependencies (referential integrity constraints):  A field in one relation may refer to a tuple in another relation by including its key  The referenced tuple must exist in the other relation for the database instance to be valid

13 13 SQL: Structured Query Language The standard language for relational data  Invented by folks at IBM, esp. Don Chamberlin  Actually not a great language…  Beat a more elegant competing standard, QUEL, from Berkeley Separated into a DML & DDL DML based on relational algebra & (mostly) calculus, which we discuss this week

14 14 Table Definition: SQL-92 DDL and Constraints CREATE TABLE Takes (sid INTEGER, exp-grade CHAR(2), cid STRING(8), PRIMARY KEY (sid, cid), FOREIGN KEY (sid) REFERENCES STUDENT, FOREIGN KEY (cid) REFERENCES COURSE ) CREATE TABLE STUDENT (sid INTEGER, name CHAR(20), )

15 15 Example Data Instance sidname 1Jill 2Qun 3Nitin fidname 1Ives 2Saul 8Martin sidexp-gradecid 1A550-0105 1A700-1005 3C500-0105 cidsubjsem 550-0105DBF05 700-1005AIS05 501-0105ArchF05 fidcid 1550-0105 2700-1005 8501-0105 STUDENT Takes COURSE PROFESSOR Teaches

16 16 From Tables  SQL  Application <!-- hypotheticalEmbeddedSQL: SELECT * FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid --> C -> machine code sequence -> microprocessor Java -> bytecode sequence -> JVM SQL -> relational algebra expression -> query execution engine

17 17 Codd’s Relational Algebra  A set of mathematical operators that compose, modify, and combine tuples within different relations  Relational algebra operations operate on relations and produce relations (“closure”) f: Relation  Relationf: Relation x Relation  Relation

18 18 Codd’s Logical Operations: The Relational Algebra  Six basic operations:  Projection   (R)  Selection   (R)  UnionR 1 [ R 2  DifferenceR 1 – R 2  ProductR 1 £ R 2  (Rename)   (R)  And some other useful ones:  JoinR 1 ⋈  R 2  SemijoinR 1 ⊲  R 2  IntersectionR 1 Å R 2  DivisionR 1 ¥ R 2

19 19 Data Instance for Operator Examples sidname 1Jill 2Qun 3Nitin 4Marty fidname 1Ives 2Saul 8Martin sidexp-gradecid 1A550-0105 1A700-1005 3A 3C500-0105 4C cidsubjsem 550-0105DBF05 700-1005AIS05 501-0105ArchF05 fidcid 1550-0105 2700-1005 8501-0105 STUDENT Takes COURSE PROFESSOR Teaches

20 20 Projection,  

21 21 Selection,  

22 22 Product X

23 23 Join, ⋈  : A Combination of Product and Selection

24 24 Union 

25 25 Difference –

26 26 Rename,      The rename operator can be expressed several ways:  The book has a very odd definition that’s not algebraic  An alternate definition:     (x)Takes the relation with schema  Returns a relation with the attribute list   Rename isn’t all that useful, except if you join a relation with itself Why would it be useful here?

27 27 Mini-Quiz  This completes the basic operations of the relational algebra. We shall soon find out in what sense this is an adequate set of operations. Try writing queries for these:  The names of students named “Bob”  The names of students expecting an “A”  The names of students in Milo Martin’s 501 class  The sids and names of students not enrolled

28 28 Deriving Intersection Intersection: as with set operations, derivable from difference A-B B-A A B A Å B ≡ (A [ B) – (A – B) – (B – A) ≡ (A - B) – (B - A)

29 29 Division  A somewhat messy operation that can be expressed in terms of the operations we have already defined  Used to express queries such as “The fid's of faculty who have taught all subjects”  Paraphrased: “The fid’s of professors for which there does not exist a subject that they haven’t taught”

30 30 Division Using Our Existing Operators  All possible teaching assignments: Allpairs:  NotTaught, all (fid,subj) pairs for which professor fid has not taught subj:  Answer is all faculty not in NotTaught:  fid,subj (PROFESSOR £  subj (COURSE)) Allpairs -  fid,subj (Teaches ⋈ COURSE)  fid (PROFESSOR) -  fid (NotTaught) ´  fid (PROFESSOR) -  fid (  fid,subj (PROFESSOR £  subj (COURSE)) -  fid,subj (Teaches ⋈ COURSE))

31 31 Division: R 1  R 2  Requirement: schema(R 1 ) ¾ schema(R 2 )  Result schema: schema(R 1 ) – schema(R 2 )  “Professors who have taught all courses”:  What about “Courses that have been taught by all faculty”?  fid (  fid,subj ( Teaches ⋈ COURSE)   subj (COURSE))

32 32 The Big Picture: SQL to Algebra to Query Plan to Web Page SELECT * FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid STUDENT Takes COURSE Merge Hash by cid Optimizer Execution Engine Storage Subsystem Web Server / UI / etc Query Plan – an operator tree

33 33 Hint of Future Things: Optimization Is Based on Algebraic Equivalences  Relational algebra has laws of commutativity, associativity, etc. that imply certain expressions are equivalent in semantics  They may be different in cost of evaluation!  c Ç d (R) ´  c (R) [  d (R)  c (R 1 £ R 2 ) ´ R 1 ⋈ c R 2  c Ç d (R) ´  c (  d (R))  Query optimization finds the most efficient representation to evaluate (or one that’s not bad)

34 34 Next Time: An Equivalent, But Very Different, Formalism  Codd invented a relational calculus that he proved was equivalent in expressiveness  Based on a subset of first-order logic – declarative, without an implicit order of evaluation  More convenient for describing certain things, and for certain kinds of manipulations  … And, in fact, the basis of SQL!


Download ppt "Relational Model & Algebra Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 13, 2005 Some slide content courtesy."

Similar presentations


Ads by Google