Presentation is loading. Please wait.

Presentation is loading. Please wait.

1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

Similar presentations


Presentation on theme: "1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…"— Presentation transcript:

1 1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…

2 1.2 The Structure Spectrum Structured (schema-first) Relational Database Formatted Messages Semi-Structured (schema-later) Documents XML Tagged Text/Media Unstructured (schema-never) Plain Text Media

3 1.3 The Relational Model The Relational Model is Ubiquitous  MySQL, PostgreSQL, Oracle, DB2, SQLServer, …l  Foundational work done at  IBM Santa Teresa Labs (now IBM Almaden in SJ) – “System R”  UC Berkeley CS – the “Ingres” System  Note: some Legacy systems use older models  e.g., IBM’s IMS Object-oriented concepts have been merged in  Early work: POSTGRES research project at Berkeley  Informix, IBM DB2, Oracle 8i As has support for XML (semi-structured data)

4 1.4 Relational Model The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F. Codd. Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377–387.

5 1.5 Relational Database: Definitions Relational database: a set of relations Relation: made up of 2 parts: Schema : specifies name of relation, plus name and type of each column Students(sid: string, name: string, login: string, age: integer, gpa: real) Instance : the actual data at a given time  #rows = cardinality  #fields = degree / arity

6 1.6 Some Synonyms FormalNot-so-formal 1Not-so-formal 2 RelationTable TupleRowRecord AttributeColumnField DomainType

7 1.7 Ex: Instance of Students Relation sid name login age gpa 53666 Jones jones@cs 18 3.4 53688 Smith smith@eecs 18 3.2 53650 Smith smith@math 19 3.8 Cardinality = 3, arity = 5, all rows distinct Do all values in each column of a relation instance have to be unique?

8 1.8 SQL - A language for Relational DBs Say: “ess-cue-ell” or “sequel”  But spelled “SQL” Data Definition Language (DDL)  create, modify, delete relations  specify constraints  administer users, security, etc. Data Manipulation Language (DML)  Specify queries to find tuples that satisfy criteria  add, modify, remove tuples The DBMS is responsible for efficient evaluation.

9 1.9 The SQL Query Language The most widely used relational query language. Originally IBM, then ANSI in 1986 Current standard is SQL-2011  2008 added x-query stuff, new triggers,…  2003 was last major update: XML, window functions, sequences, auto-generated IDs. Not fully supported yet SQL-1999 Introduced “Object-Relational” concepts.  Also not fully supported yet. SQL92 is a basic subset  Most systems support at least this PostgreSQL has some “unique” aspects (as do most systems). SQL is not synonymous with Microsoft’s “SQL Server”

10 1.10 Creating Relations in SQL Creates the Students relation.  Note: the type (domain) of each field is specified, and enforced by the DBMS whenever tuples are added or modified. CREATE TABLE Students (sid CHAR(20), name CHAR(20), login CHAR(10), age INTEGER, gpa FLOAT)

11 1.11 Table Creation (continued) Another example: the Enrolled table holds information about courses students take. CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2))

12 1.12 Adding and Deleting Tuples Can insert a single tuple using: INSERT INTO Students (sid, name, login, age, gpa) VALUES ('53688', 'Smith', 'smith@ee', 18, 3.2) Can delete all tuples satisfying some condition (e.g., name = Smith): DELETE FROM Students S WHERE S.name = 'Smith' Powerful variants of these commands are available; more later!

13 1.13Keys Keys are a way to associate tuples in different relations Keys are one form of integrity constraint (IC) sidnameloginagegpa 53666Jonesjones@cs183.4 53688Smithsmith@eecs183.2 53650Smithsmith@math193.8 sidcidgrade 53666Carnatic101C 53666Reggae203B 53650Topology112A 53666History105B Enrolled Students PRIMARY Key FOREIGN Key

14 1.14 Primary Keys A set of fields is a superkey if:  No two distinct tuples can have same values in all key fields A set of fields is a key for a relation if :  It is a superkey  No subset of the fields is a superkey what if >1 key for a relation?  One of the keys is chosen (by DBA) to be the primary key. Other keys are called candidate keys. E.g.  sid is a key for Students.  What about name?  The set {sid, gpa} is a superkey.

15 1.15 Primary and Candidate Keys in SQL Possibly many candidate keys (specified using UNIQUE), one of which is chosen as the primary key. Keys must be used carefully! “For a given student and course, there is a single grade.” “Students can take only one course, and no two students in a course receive the same grade.” CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid)) CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade)) vs.

16 1.16 Foreign Keys, Referential Integrity Foreign key: a “logical pointer”  Set of fields in a tuple in one relation that `refer’ to a tuple in another relation.  Reference to primary key of the other relation. All foreign key constraints enforced?  referential integrity!  i.e., no dangling references.

17 1.17 Foreign Keys in SQL E.g. Only students listed in the Students relation should be allowed to enroll for courses.  sid is a foreign key referring to Students: CREATE TABLE Enrolled (sid CHAR(20),cid CHAR(20),grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students); sidcidgrade 53666Carnatic101C 53666Reggae203B 53650Topology112A 53666History105B Enrolled sidnameloginagegpa 53666Jonesjones@cs183.4 53688Smithsmith@eecs183.2 53650Smithsmith@math193.8 Students 11111 English102 A

18 1.18 Next Up We’ll talk a bit about the SQL DML Then we’ll start describing the DBMS from storage on up

19 1.19 The SQL DML The SQL DML Single-table queries are straightforward. To find records for all 18 year old students with gpa’s above 2.0, we can write: SELECT * FROM Students S WHERE S.age=18 AND S.gpa > 2.0 To get just names and logins, replace the first line: SELECT S.name, S.login

20 1.20 Basic SQL Queries SELECT [DISTINCT] target-list FROM relation-list WHERE qualification relation-list : A list of relation names –possibly with a range-variable after each name target-list : A list of attributes of tables in relation-list qualification : Comparisons combined using AND, OR and NOT. –Comparisons are Attr op const or Attr1 op Attr2, where op is one of =≠<>≤≥ DISTINCT: (optional) indicates that the answer should have no duplicates. –In SQL SELECT, the default is that duplicates are not eliminated! (Result is called a “multiset”)

21 1.21 Querying Multiple Relations Querying Multiple Relations Can specify a join over two tables as follows: SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B' result = S.name E.cid Jones History105 Note: obviously no referential integrity constraints have been used here.

22 1.22 Basic Query Semantics The Semantics of a SQL query are defined in terms of the following conceptual evaluation strategy: 1. do FROM clause: compute cross-product of tables (e.g., Students and Enrolled). 2. do WHERE clause: Check conditions, discard tuples that fail. 3. do SELECT clause: Delete unwanted fields. 4. If DISTINCT specified, eliminate duplicate rows. Probably the least efficient way to compute a query! A query optimizer will find more efficient strategies to get the same answer.

23 1.23 Step 1 – Cross Product SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'

24 1.24 Step 2) Discard tuples that fail predicate SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'

25 1.25 Step 3) Discard Unwanted Columns SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'

26 1.26 Aggregate Operators For calculation and analytics COUNT (*) COUNT ([ DISTINCT ] A) SUM ([DISTINCT] A) AVG (A) MAX (A) MIN (A) SELECT AVG (S.gpa) FROM Students S WHERE S.age=18; SELECT COUNT (*) FROM Students; SELECT COUNT ( DISTINCT S.age ) FROM Students S WHERE S.name=‘Bob’;

27 1.27 GROUP BY and HAVING Sometimes, we want to apply aggs to each of several groups of tuples.  This query computes the average gpa per major (assume students have a “major” attribute)  If you want to exclude “small” majors, use Having: SELECT S.major, AVG (S.gpa) as AvgGPA FROM Students S GROUP BY S.major ; SELECT S.major, AVG (S.gpa) as AvgGPA FROM Students S GROUP BY S.major HAVING COUNT (*) > 10 ;

28 1.28 (Slightly) Less Basic SQL Queries SELECT [DISTINCT] target-list FROM relation-list WHERE qualification GROUP BY grouping-list HAVING group-qualification

29 1.29 Conceptual Evaluation The cross-product of relation-list is computed, tuples that fail qualification are discarded, `unnecessary’ fields are deleted, and the remaining tuples are partitioned into groups by the value of attributes in grouping-list. One answer tuple is generated per qualifying group.

30 1.30 Conceptual Evaluation (cont.) The group-qualification is then applied to eliminate some groups.  Expressions in group-qualification must have a single value per group!  That is, attributes in group-qualification must be arguments of an aggregate op or must also appear in the grouping-list. One answer tuple is generated per qualifying group.

31 1.31 Okay: Let’s start from the bottom up… Query Optimization and Execution Relational Operators Access Methods Buffer Management Disk Space Management Student Records stored on disk Database app These layers must consider concurrency control and recovery


Download ppt "1.1 CAS CS 460/660 Introduction to Database Systems Relational Model and more…"

Similar presentations


Ads by Google