Database Systems n Database – an integrated collection of related data – Related data, e.g.: Information stored in an University l Students, Courses, Faculty, Students taking courses, Faculty teaching courses,.... – integrated:all data is stored in a uniform way on secondary storage n Database Management System – a collection of programs that is used to create, maintain and manipulate data in the database n Database System – DB + DBMS + Application Programs
Database System Organization: A Simplified View Application Programs Query and transaction Processing Management of Stored Data Meta-data Database DBS DBMS Users
Databases vs File Systems n What is wrong with a File System? – Data Integration and Data Sharing n Features of DBMS that cannot be provided with a file system – Data Consistency – Controlled Redundancy – Program-Data Independence – Integrity Enforcement – Concurrency Control – Backup and Recovery – Security and Privacy – Multiple views of Data
Additional Advantages n Performance n Expandability/Flexibility n Reduced Applicaiton Development Time n Enforcement of Standards n Economies of Scale The Price You Pay !! n High initalcost n High overhead n Not special purpose When is DBMS Inappropriate? n Database is small and has simple structure n applications are simple and special-purpose n applications with real-time requirements n concurrent, multi-user access to data is not needed
The Three Levels of Abstraction n Internal Level – describes the physical storage structure of the DB n Conceptual Level – describes the structure of the whole DB – hides storage and implementationdetails n External Level – point of view of users Logical and Physical Data Independence
Data Modeling / Database Design n Database Design – is the activity of specifying the schema of a database in a given data model n Database Schema – is the structure of a database that l captures data types, relationships, constriants on the data l is independent of any application program l changes infrequently n Database instance or state – the actual data in the database at a given time n Data Model – a set of primitives for defining the structure of a DB – a set of operations for specifying the retrievals and updates on a DB – relational, hierarchical, network, object-oriented,.....
Relational Model (Codd 1970) n The most popular implementation model – simplest, has the most uniform data structures,has a formal mathematical model, powerful query languages (relational algebra), existence of 4th generation languages – but, not suitable for some applications n Everything is represented by relations – Formally: Given sets D 1, D 2,....D n (not necessarily distinct), a relation R D 1 X D 2 X...X D n – D i 's are the domains and n is the arity (degree) of R – elements of R are called tuples – number of tuples in R is the cardinality of R
n relational data model helps to view a relation as a table – each row represents a tuple (record) – each column represnts an attribute (field) n Observe the following properties: – no two rows are identical – the ordering of tuples is unimportant – the ordering of columns is important Relational Model (continued) Part # PName Color Weight P1 Nut Red 12 P2 Bolt Blue 17 P3 Screw Green 16 PART
Relation Schema n A relation scheme R specifies – the attribute names A i of R – the domain D i (datatype + format) for each A i l datatype is a set of atomic data values: no attribute is set-valued (1st Normal Form or, 1-NF) no attribute is composite – format is the specification of the representationof a data values n A collection of relation schema used to represent the information in the database is the database scheme n A relation instance r of R (denoted r(R)) is the set of tuples that compose the relation at a given intance, i.e. the current values. n cardinality |r(PARTS)| = 3, the arity |PARTS| = 4 n In general, |R| > 0, |r(R)| 0
Keys n Let R be a realtion schema and K R n K is a superkey of R if it can uniquly identify any tuple in any r(R). There are no tuples t and t' such that t[K] = t'[K} n K is a candidate key if K is a minimal superkey. There is no K' K such that K' is also a superkey of r(R) n A primary key is one of the candidate keys, remaining candidate keys are alternate keys E.g.: CLASS (Course#, Prof, Sched, Room) Identify superkeys, candidate keys Key is a property of a relation schema but is not of a relation
Relational Database Schema n A database schema is a set of relation schemas and a set of integrity constraints n Integrity constraints – structural l key constraints: uniqueness of keys l entity integrity constraint: no primary key value can be null l referential integrity constraint – semantic
Referential Integrity Constraints n In the relational model, the only way an entity can reference another entity is through the value of the primary key of the second entity n A foreign key (FK) is a set of one or more attributes of a relation R 1 that forms a primary key (PK) of another relation R 2 n This means – the attributes in FK have the same domain as the primary key attributes of R 2 – the value of FK in any tuple t 1 of r(R 1 ) is either null or matches with a value of PK for some tuple t 2 in r(R 2 ), i.e., t 1 [FK] = t 2 [PK] EMP SSN EName DNO DEPT DNO DName Mgr Each employee must belong to some department
n we say attributes FK of R 1 reference or refer to the relation R 2 n Referential integrity constraints can be defined for the same relation, i.e., tuples may refer to another tuple in the same relation Referential Integrity Constraints (continued) EMP SSN EName DNO SUPERSSN
n Query languages allow manipulation and retrieval of data from a database n Relational model supports simple, powerful query languages – strong formal foundation based on logic – allows for optimization n Two mathematical languages form the basis for rel languages (e.g., SQL) and for implementation – Relational Algebra: More operational, useful for representing execution plans – Relational Calculus: Lets users describe what they want, rather than how to compute it (non-operational, declarative) n Basic operations: – selection, projection, cross-product, set-difference, union, intersection, join, division Relational Query Languages
SQL – SQL (Structured Query Language) is the query language for the System R developed at IBM San Jose [Astraham, Gray, Lindsay, Selinger..] – SQL is now the query language for IBM's DB2 and the de-facto standard on most commercial RDBMS – SQL is a comprehensive language providing statements for data definition, query and update. Hence it is both DDL and DML – SQL allows to create views, it can be embedded in a general- purpose programming language (C or PASCAL) – SQL has one basic statement for retrieving data from the database: the SELECT statement SELECT FROM WHERE – Standards: l SQL or SQL1 (ANSI 1986) l SQL2 or SQL-92 (ANSI 1992) l SQL3 underway: extends SQL with OO and other concepts
SQL Data Types n Numeric – Integers of various ranges: INTEGER (or INT), SMALLINT – Real numbers of various precision: FLOAT, REAL, DOUBLE PRECISION – Formatted numbers: DECIMAL(i,j) or DEC(i,j) or NUMERIC(i,j) n Character Strings – Fixed length n: CHAR(n) or CHARACTER(n) – Variable length of maximum n: VARCHAR(n) or CHAR VARYING(n) (default n =1) n Bit strings – Fixed length n: BIT(n) – Varying length of maximum n: VARBIT(n) or BIT VARYING(n)
SQL Data Types (continued) n Date & Time [SQL2] – DATE (10 positions): YYYY-MM-DD – TIME (8 positions): HH:MM:SS – TIME(i) defines i decimal fractions of seconds (8+1+i positions): HH:MM:SS:ddd...d – TIME WITH TIME ZONE includes the displacement from standard universal time zone [+13:00 to -12:59] (6 additional positions): HH:MM:SS+/-HH:MM – TIMESTAMP:date, time with 6 fractions of seconds and optional time zone – INTERVAL: Year/Month or Day/TIME
DDL n DDL is used to define the (schema of) database – to create a database schema – to create a domain – to create, drop. alter a table – to create, remove an index [defunct in SQL2] – to create or drop a view – to define integrity constraints – to define access privileges to users (Oracle: CONNECT, RESOURCE, DBA) – to GRANT or REVOKE privileges ON/TO object/user n SQL2 supports multiple schemas – CREATE SCHEMA name AUTHORIZATION user; – CREATE SCHEMA EMPLOYEE AUTHORIZATION atluri;
Create Domain n CREATE DOMAIN name_dom AS VARCHAR(30); n CREATE DOMAIN project_dom AS CHAR(20); n CREATE DOMAIN dept_dom AS VARCHAR(20) DEFAULT 'none'; n CREATE DOMAIN city_dom CHAR(20) DEFAULT NULL; n CREATE DOMAIN hour_dom FLOAT DEFAULT 0; n CREATE DOMAIN gender_dom CHAR(1) CHECK (VALUE IN ('F', 'f', 'M', 'm'));
SQL Schema EMP(Name,SSN,DNO,BirthPlace) DEPT(DName,DNO,MGRSSN) PROJECT(PName,PNO,PLocation,DNum) WORKSON(ESSN,PNO,Hours) CREATE SCHEMA 'COMPANY'; CREATE TABLE EMP ( ENamename_domNOT NULL, SSNCHAR(9)NOT NULL, DNOINTEGERNOT NULL, BirthPlacecity_dom, PRIMARY KEY(SSN), FOREIGN KEY (DNO) REFERENCES DEPT (DNO) );
Constraints Constraints on attributes l NOT NULL constraint l DEFAULT value allows the specification of default value (without the default clause, the default value is NULL) l PRIMARY KEY (attribute-list) l UNIQUE (attribute list) allows the specification of alternative key l FOREIGN KEY (key) REFERENCES table (key) Enforcement of Time Constraints l Immediate l Deferrable (until commit time) Actions if a referential integrity constraint is violated (referential triggered actions): l SET NULL l CASCADE (propagate action) l SET DEFAULT) Qualifying actions by the triggering condition: ON DELETE and ON UPDATE FOREIGN KEY (DNO) REFERENCES DEPT (DNO) ON DELETE SET DEFAULT ON UPDATE CASCADE
Naming of the Constraints Keyword CONSTRAINT may be used to name a constraints Helpful in modifying or dropping the constraint CREATE TABLE EMP ( ENamename_domNOT NULL, SSNCHAR(9)NOT NULL, DNOINTEGERNOT NULL, BirthPlacecity_dom, CONSTRAINT Emp_PK PRIMARY KEY(SSN), CONSTRAINT Emp_FK FOREIGN KEY (DNO) REFERENCES DEPT (DNO) );
System Catalog (Dictionary) Dictionary stores a set of tables that describe the database: – Base Relations (tables) l possible attributes:table-name, creator, #of-tuples, tuple-length, #of- attributes,.. – Attributes of Relations (columns) l possible attributes: table-name, attribute-name, format, order, key.,, – Indexes l possible attributes: table-name, index-name, key-attribute,.. – Authorization – Integrity – In Oracle, the dictionary is made up of tablespaces (one or more physical files): SYSTEM, USERS, TEMP, APPLICATIONS
DROP command can be used to remove – a schema: DROP SCHEMA Company CASCADE; DROP SCHEMA Company RESTRICT l CASCADE option removes everything: tuples, tables, domains,... l RESTRICT option removes the schema if it has no elements in it – a table: DROP TABLE EMP CASCADE; DROP SCHEMA EMP RESTRICT l CASCADE option removes the table and all references to it l RESTRICT option removes the table if it is not referenced DROP Command
ALTER Command The ALTER allows to: – alter the domain of an attribute ALTER TABLE Student – ALTER GPA NUMBER(4,2); – set or drop default value of an attribute ALTER TABLE Student ALTER GPA DROP DEFAULT; ALTER TABLE Student ALTER GPA SET DEFAULT 0.00; – add a new attribute to a relation ALTER TABLE Student ALTER Admission DATE; – drop an attribute (not in SQL1) ALTER TABLE Student DROP GPA [CASCADE/RESTRICT];
The Select Statement – The general form of a SELECT statement: SELECT FROM WHERE GROUP BY HAVING ORDER BY
Relational Operators in SQL – Projection: SELECT A,B FROM R – Selection: SELECT * FROM R WHERE F – Product of two tables: A X B SELECT R. , S. FROM R, S
l Query: List the names of all employees that work in CS SELECT Name FROM EMP WHERE Dept = CS l Renaming of attributes: SELECT Name AS CSName FROM EMP WHERE Dept = CS SELECT DISTINCT BirthPlace FROM EMP (UNIQUE is not valid any more in SQL2) More Queries
l Give the number of all employees in the CS Department SELECT COUNT( ) FROM EMP WHERE Dept = CS l Give the number of employees in each department SELECT Dept, COUNT( ) FROM EMP GROUPBY Dept l Give the names of the departments that have more than 50 employees. Also list the number of employees in those departments SELECT Dept, COUNT( ) FROM EMP GROUPBYDept HAVING COUNT( ) > 50 l More SQL Built-in Functions SUM, AVG,MAX,MIN (List the employee names who make more than the average salary of all employees) Some More..