Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson

Similar presentations


Presentation on theme: "© Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson"— Presentation transcript:

1 © Ron Rogerson Slide 1 Relational Databases Ron Rogerson

2 © Ron Rogerson Slide 2 Topics on this course What is a database? and what is it used for? What problems arise in managing one? file based systems data dependence What kind of system might resolve those problems? the three level architecture functions of a dbms firm development principles - development cycle and conceptual modelling and.. one which uses.. The relational approach - a theoretical architecture the relational model manipulation with relational algebra SQL - a practical implementation of relational theory Bringing it together - developing a relational d/b with SQL Other topics - introduced in Block 1 & returned to throughout course background to data management other kinds of information system developments in databases

3 © Ron Rogerson Slide 3 What is a database? Database: a collection of data stored in a computer system Data: a representation of information (or, information as interpreted data): Information can have >1 representation Data has no meaning in isolation (domain of discourse) Computers process data, not information Semantic properties of data User: a person whose information requirements are being supported Sharing data - differing requirements User Process application process: single purpose database tool: general purpose

4 © Ron Rogerson Slide 4 What problems arise in managing a database? File-based approach file organisation determines access ops. close association of program with file programs have to do it all (consistency, relationships, access control etc) data likely to be duplicated resistent to change data dependent Database approach DBMS, not programs, does access control, manages storage/retrieval explicit single database definition - the schema NOTE that, although we are still talking in general about a structured database approach, some modern systems such as OO and XML databases do exhibit characteristics of the file- based approach

5 © Ron Rogerson Slide 5 Data Retrieval in a File-Based System Template No of Coupons Coupon 1 Coupon2 Coupon3 No of Guarantors Start Sedol Issuer Issue Date Guarantor 1 Guarantor 2 End XX X X X X X X X X X X X X X X X X X X Y Y Y Y M M D D X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Different template for each file, 'hard coded' in programs All 'links' between files done by programs

6 © Ron Rogerson Slide 6 Actual systems architectures Client-Server two-tier: interface & application on client, d/b on server three-tier: interface, application and d/b separate Client-Multiserver multiple d/bs on separate servers each server providing specific data connection management required Distributed dbms multiple d/bs on separate servers ddbms provides location independence, security and integrity horizontal and/or vertical fragmentation two-phase commit controls updates replication reduces network traffic improves availability always reduces consistency somewhat updates can be pushedor pulled Mobile systems d/b fragments copied to intermittently disconnected devices updated when possible

7 © Ron Rogerson Slide 7 What kind of system might resolve these problems? - a system which provides the functions of a dbms Data Definition Constraint Definition and Enforcement Access Control Data Manipulation Restructuring and Reorganisation restructure: a change to the design, e.g. adding a column or a table (logical schema) reorganisation: a change within the design, e.g. to assimilate recently added records and optimise indices (storage schema). Transaction Support Concurrency Support Recovery

8 © Ron Rogerson Slide 8 What kind of system might resolve these problems? (2) a system which gives data independence Logical data independence change to logical schema has no impact on user processes Physical data independence change to storage schema has no impact on user processes ANSI/SPARC three-schema architecture external logical storage a system which provides interaction facilities Data manipulation language /query language (eg SQL) Host language; embedded statements Data definition language (eg SQL)

9 © Ron Rogerson Slide 9 Store d d/b Storage Schema Logical Schema External Schemas User Processes What kind of system might resolve these problems? (3) The 3-schema architecture

10 © Ron Rogerson Slide 10 What kind of system might resolve these problems? (4) One developed using the database development lifecycle: Establishing data requirements Data analysis produces conceptual data model Database design produces logical schema (NB specific to db type) Implementation produces storage schema et al (NB specific to platform) Testing may lead to iteration back to any of the earlier stages One developed using a conceptual data model: a formal representation of a data requirement, independent of how it may be realised

11 © Ron Rogerson Slide 11 What kind of system might resolve these problems? Conceptual data models Entity-Relationship model entity types and entity occurrences attributes (identifier - special unique attribute) Relationships Degrees of relationships BusDriver 1 n A Bus can have many Drivers A Driver drives not more than one bus Participation conditions A bus may have no driver A driver must be allocated to a bus BusDriver Optional Mandatory

12 © Ron Rogerson Slide 12 What kind of system might resolve these problems? Conceptual data models (2) Entity types Bus(RegNo,NoOfSeats,Date1stReg) Driver(DriverNo,Surname,DateOfBirth) Weak and Strong entity types Constraint a statement of a necessary restriction that cannot be expressed elsewhere in the model (e.g.only drivers over 21 may drive buses with more than 8 seats) Assumption a statement of something that had to be assumed in order to complete the model and needs pointing out (e.g. only a driver’s current bus allocation is shown)

13 © Ron Rogerson Slide 13 m:n relationships What kind of system might resolve these problems? Conceptual data models (3) BusDriver is this relationship possible? what additional information needs to be recorded about occurrences of the relationship? how might we record it? how does this change the Assumptions? Recursive relationships Driver a driver supervises another driver

14 © Ron Rogerson Slide 14 The relational approach - a theoretical architecture Relation: can be pictured as a form of table, each row representing an occurrence. Terminology column = attribute row = tuple no. of attributes = degree no. of tuples = cardinality identifier = primary key Properties of Relations: all attributes have a value in every tuple all values are atomic all values of any attribute are same kind each attribute has name, unique within relation each tuple is unique ordering of attributes& tuples not significant. Domain: a named set of values, with a common meaning, from which 1 or more attributes draw their actual values NB values on different domains not comparable.

15 © Ron Rogerson Slide 15 The relational approach - a theoretical architecture (2) Candidate key: an attribute with the properties of uniqueness and minimality implies a semantic constraint is either a primary or alternate key Primary key: a candidate key chosen as the identifier entity integrity rule declared in the schema definition Alternate key: any other candidate key declared in the schema definition Qualified attribute names - dot notation; may be needed where same name occurs in >1 relation

16 © Ron Rogerson Slide 16 Foreign key: an attribute (or combination of attributes) in a relation, whose values are the same as values of a candidate key (normally the primary key) of some (not necessarily distinct) relation. The only method of representing relationships by default, represents an n:1 relationship always at the n “end” of a 1:n (remember the “crow’s foot” pointer) relationship can be represented by 'posting' primary key of the '1' end into the other relation – 'posted attribute' method The relational approach - a theoretical architecture(3) must always have value same as that of some value of the key it references (referential integrity rule) semantic constraint

17 © Ron Rogerson Slide 17 The relational approach - a theoretical architecture(4) BusDriver Why can't the foreign key go at the “1” end? Bus Driver RegNo Make Driver Id Name ABC123 Scania 1,21Brown DEF456 Volvo32Smith GHJ789 DAF3Bloggs Foreign key Representing a 1:n using posted key Bus Driver RegNo Make Id Name BusNo ABC123 Scania 1Brown ABC123 DEF456 Volvo2Smith ABC123 GHJ789 DAF3Bloggs DEF456 Foreign key

18 © Ron Rogerson Slide 18 The relational approach - a theoretical architecture(5) BusDriver so... Representing a 1:1 using posted key Bus Driver RegNo Make Id Name BusNo ABC123 Scania 1Brown ABC123 DEF456 Volvo3Bloggs DEF456 GHJ789DAF this becomes an alternate key as well as foreign key Bus Driver RegNo Make Id Name BusNo ABC123 Scania 1Brown ABC123 DEF456 Volvo2Smith ABC123 GHJ789 DAF3Bloggs DEF456 We can't have this duplication

19 © Ron Rogerson Slide 19 The relational approach - a theoretical architecture (6) Recursive relationships m:n relationships cannot be represented by 'posted attribute' why not? relationships which are optional at the 'n' end cannot be represented, either why not? Deletions from a referenced relation: Restricted effect cascade delete effect default effect

20 © Ron Rogerson Slide 20 Representing m:n relationships The relational approach - a theoretical architecture (7) BusDriver Allocated Bus(RegNo,NoOfSeats,Date1stReg) Driver(DriverNo,Surname,DateOfBirth) BusDriver Allocation  Allocation(RegNo,DriverNo,Date) m:n decomposed into new intersection relation + two 1:n relationships what new semantic constraint is imposed by the above choice of primary key? how could we relax it? NB: conceptual - decompose to add info. relational - decompose to represent at all Rules for new intersection relations (when used to decompose an m:n) Participation conditions Degrees New entity Both mandatory Both "n" Old entities Same as beforeBoth "1" n

21 © Ron Rogerson Slide 21 The relational approach - a theoretical architecture (8) Representing optional relationships ( relation for relationship method) mandatoryl relationships can use this method too - much more complex than posted attribute, but treats all relationships same way A B AB A(a) B(b, a) becomes.. A B A(a) B(b) AB(b, a) AB becomes.. A B AB A(a) B(b, a) (where B.a is an alternate key) (note that the non-p.k. of AB must be declared alternate key) A B A(a) B(b) AB(b, a) AB AB(a, b) n N OTE that degrees of original relations “cross over” as they move to the new relation, and as with an m:n, the original relations become “1”

22 © Ron Rogerson Slide 22 The relational approach - a theoretical architecture Relational Algebra Operators act on whole relations (set operators) Have closure property (produce new relation) Relationally complete theoretical basis for manipulation Operators reflect structure of relations (and v.v.)  SELECT select Allocation where RegNo = ‘R123ABC’ produces horizontal slicing of relation; i.e picks tuples according to some value(s) of attribute(s) in this case, lists numbers of all drivers ever allocated to the bus with RegNo R123ABC

23 © Ron Rogerson Slide 23 The relational approach - a theoretical architecture Relational Algebra (2) PROJECT project Bus over NoOfSeats produces a “vertical slicing”; i.e., selects all the unique (combinations of) value(s) of chosen attribute(s) in this case, lists (once only in each case) the seating capacities of buses in the fleet Combining expressions alias DriversUnder21DriversUnder21 alias (select Driver where DateOfBirth > ) project DriversUnder21 over Surname nested project (select Driver where DateOfBirth > ) over Surname

24 © Ron Rogerson Slide 24 The relational approach - a theoretical architecture Relational Algebra (3) JOIN (natural) join Driver and Allocation “pastes together” the tuples of the given relations which match on some attribute(s) with same name & domain – in this case, DriverNo in this case, produces a complete record of every driver allocation including, for each tuple, all the attributes from each table – but the joining column appears once only NOTE this means the new table will contain all the details of every driver once for every time s/he has been allocated to a bus A join is over a shared attribute and that normally means a foreign key A relation can be joined to itself (e.g. where there is a recursive relationship) (but must use aliases)

25 © Ron Rogerson Slide 25 The relational approach - a theoretical architecture Relational Algebra (4)  DIVIDE Allocations alias (project Allocation over DriverNo, RegNo) DriverNoRegNo 100N456CDE 101N456CDE 101R123ABC Buses alias (project Bus over RegNo) RegNo N456CDE R123ABC divide Allocations by Buses over RegNo DriverNo 101 (produces list of nos. of drivers who’ve been allocated to all buses)

26 © Ron Rogerson Slide 26 The relational approach - a theoretical architecture Relational Algebra (5) UNION, INTERSECTION and DIFFERENCE require union-compatibility each relation involved is such (or can be changed such) that the ith attribute of each is on the same domain and has the same name UNION “adds” relations together YoungDrivers alias (select Driver where DateOfBirth > ) OldDrivers alias (select Driver where DateOfBirth < ) YoungDrivers union OldDrivers

27 © Ron Rogerson Slide 27 The relational approach - a theoretical architecture Relational Algebra (6) INTERSECTION picks tuples of 2 relations which occur in both NotUnder21 alias (select Driver where DateOfBirth < ) NotOver60 alias (select Driver where DateOfBirth > ) NotUnder21 intersection NotOver60 NOTE that, in this case, the whole operation is logically equivalent to an “and” OTHER OPERATORS theta-join Cartesian product outer join CONSTRAINTS USING R. A. constraint (project Bus over DriverNo) difference (project Driver over DriverNo) is empty

28 © Ron Rogerson Slide 28 Relational Algebra (7) BusDriver Using relational algebra to represent mandatory participation at the referenced end of a relationship (project Bus over RegNo) constraint ( difference (project Driver over BusNo) ) is empty Bus Driver RegNo Make Id Name BusNo ABC123 Scania 1Brown ABC123 DEF456 Volvo2Smith ABC123 GHJ789 DAF3Bloggs DEF456 Foreign key This bus has no driver Bus Driver RegNo Make Id Name BusNo ABC123 Scania 1Brown ABC123 DEF456 Volvo2Smith ABC123 3Bloggs DEF456

29 © Ron Rogerson Slide 29 The relational approach - a theoretical architecture Relational Algebra (7) Updating Insertion: Driver:= Driver union Deletion: Driver:= Driver difference Amendment: Driver:= Driver difference Driver:= Driver union

30 © Ron Rogerson Slide 30 The relational approach - a theoretical architecture Normalisation (1) Normalisation aims to remove redundancy avoids possible inconsistency avoids deletion/insertion anomalies reduces storage In a normalised relation (i.e., one which is in BCNF) every non-p.k. attribute is a fact about the p.k., the whole p.k. and nothing but the p.k. Single Valued Facts for every Girl there is (exactly) one Boy Functional Dependencies Girl -> Boy Girl “determines” Boy note the E-R equivalent: note that Girl -> Boy does not mean Boy -> Girl (we say an FD is "not reversible") but what would the E-R diagram look like if we did additionally know that Boy -> Girl ? Girl Boy

31 © Ron Rogerson Slide 31 The relational approach - a theoretical architecture Normalisation (2) Derived FDs by transitivity: Girl -> Boy Boy -> Boy’s_Mother hence, Girl -> Boy’s_Mother quick check: are there any FDs given, whose "left hand" is the same as the "right hand" of any other? by augmentation and transitivity: Programme,StartTimeDate -> Announcer TVChannel, StartTimeDate -> Programme we can augment the second FD to: TVChannel, StartTimeDate -> Programme, StartTimeDate therefore TVChannel,StartTimeDate -> Announcer quick check: are there >1 FDs with combined attributes on the LH? - if not, no augmentation can be done, but if so, do any of those have a RH which is part of the LH of another? if not, no augmentation can be done, but if so, can the RH of that first FD be augmented to make it the same as the LH of the other?

32 © Ron Rogerson Slide 32 The relational approach - a theoretical architecture Normalisation (3) First Normal Form (1NF) A relation is in 1NF iff every non-p.k. attribute is functionally dependent on (i.e., is a fact about) the p.k. Note that, anything which is a relation must at least be in 1NF Second Normal Form (2NF) A relation is in 2NF iff it is in 1NF and every non-p.k. attribute is fully functionally dependent on the p.k. (i.e., not on any subset of the p.k.) Note that we are only interested in the dependency (or lack of it) between a non-p.k. attribute and the p.k., not in any other dependencies among the non-p.k. attributes. Moving from 1NF to 2NF “Project out” any “offending” FDs into new relation(s) Determinant (l.h. side) of these FDs becomes p.k. of new relation “R.h. side(s)” of these FDs becomes non-p.k. attribute(s) of the new relation(s) and is/are removed from the “old” one But the determinant remains in the “old” relation so that we have non-loss decomposition “Projected-out” FDs which share the same determinant will go into a shared new relation Process must be “non-loss”, i.e. the original relation could be recreated by “joining” the new ones.

33 © Ron Rogerson Slide 33 The relational approach - a theoretical architecture Normalisation (4) Third Normal Form (3NF) A relation is in 3NF iff it is in 2NF and every non-p.k. attribute is non-transitively dependent on the p.k. (take great care with definitions in the course material) Note that a transitively-derived F.D. does not necessarily make the attribute transitively dependent i.e., where A -> B and B -> C, then A -> C is a transitively derived dependency; but C is not transitively dependent on A if either B -> A or C -> B Moving from 2NF to 3NF Process is just the same as 1NF to 2NF, except that we “project out” the FD which is the “right hand” part of the complete transitive FD, i.e. the “B -> C” part Boyce-Codd Normal Form (BCNF) A relation is in BCNF iff it is in 3NF and every determinant is a candidate key Note that, unlike 2NF and 3NF, we are interested in all FDs in the relation, not just those involving the p.k.

34 © Ron Rogerson Slide 34 The relational approach - a theoretical architecture Normalisation (5) Moving from 3NF to BCNF Process is just the same as previous stages If the determinant of a F.D. in the relation is the p.k., then that’s fine If it’s not the p.k., then unless it’s an alternate key, the F.D. is an “offending” one It can only be an alternate key if it has a 1:1 relationship with the p.k.; we will only know this if it has a “reversible” F.D. with the p.k., i.e. A -> B and B -> A

35 © Ron Rogerson Slide 35 Relational model SQL Relational implementation Theoretical specification of what is to be done Specifies how relational model is to be implemented - could be many but in practice only SQL Many implementations of SQLexist; often they cover only a subset of the standard, and may cover a superset 3 level architecture SQL schema D/b schema Manipulation languages: the relational algebra and relational calculus SQL (does not include a storage DDL) Implementation of some version of SQL plus further command set

36 © Ron Rogerson Slide 36 Architectures - Theoretical and Actual Store d d/b Storage Schema Logical Schema External Schemas User Processes Store d d/b User Processes Database Schema Base table View

37 © Ron Rogerson Slide 37 SQL - a practical implementation of relational theory 1970s IBM developments - Ted Codd - SEQUEL (Structured English Query Language) Many variants developed 1987 First ANSI standard - SQL:1987 ("SQL1" - also known as ISO9075, BS6964:1988) includes a DDL and DML lacked many features of the model 1989 SQL: added p.k. and f.k. constraints SQL:1992 ("SQL2") defines many features beyond SQL:1989 most implementations support it many also offer "superset" functions which may add features but reduce portability 1998 SQL3 includes aspects of OO technology not covered in this course NOTES: SQL does not define a storage DDL, nor some other management functions; an implementation may do these any way it chooses SQL databases consist of tables and columns

38 © Ron Rogerson Slide 38 SQL - a practical implementation of relational theory (2) SELECT the query statement operates on tables all query statements produce a table logical processing model helps to explain how it works SELECT * FROM country “FROM" clause produces an intermediate table, which is a full copy of the country table "SELECT *" produces a final table giving the result - which in this case is also the full table SELECT births, population FROM country "FROM" clause produces the intermediate table "SELECT" 'slices' this vertically into just the 2 columns required, which form the final table. SELECT DISTINCT gdp FROM EUROBOND as above, but 'slices' vertically to produce only one occurrence of each value compare with 'PROJECT' in the Relational Algebra

39 © Ron Rogerson Slide 39 SQL - a practical implementation of relational theory (3) VALUE EXPRESSIONS (manipulating stored values) COLUMN: SELECT name, (births/population)/1000 AS birth_rate_per_thousand FROM country (NB can use + - * / (number), || (string) Columns must be suitable format) SET: SELECT AVG(cars) FROM country (NB can use AVG, DISTINCT, COUNT(*), SUM, MAX, MIN. Columns must be suitable format) STRING FUNCTIONS: SELECT name, SUBSTR(name,1,3) FROM country (NB also LENGTH, CAST, SUBSTRING, etc)

40 © Ron Rogerson Slide 40 SQL - a practical implementation of relational theory (4) The WHERE clause (or, specifying a row search condition) SELECT staff_no FROM staff WHERE name = 'Jennings’ Logical processing model for this query: FROM clause copies whole of STAFF into intermediate table WHERE clause slices just the rows which meet it into a 2nd intermediate table SELECT takes STAFF_NO into final table SELECT name ((births-deaths)/population)*100 AS growth_rate FROM country WHERE ((((births-deaths)/population)*100 AS growth_rate FROM country WHERE ((births-deaths)/population)*100 > 0.5 (NB can use =,, =, <> ) can use AND OR NOT (care needed with brackets)

41 © Ron Rogerson Slide 41 SQL - a practical implementation of relational theory (5) Other Operators  SELECT...  WHERE quantity BETWEEN 5000 AND 6000 (inclusive)  WHERE name IN (‘Berlin’, ‘Bonn’,...,)  WHERE classification LIKE ‘_h%s’  WHERE cars IS NULL Joins using FROM SELECT s_country.name, capital, population FROM s_country, s_city WHERE capital = s_city.name (cf. the relational algebra JOIN)  (NB what kind of table does this FROM produce, in the logical processing model?) Aliases  SELECT p.staff_no, p.name, q.staff_no FROM staff p, staff q WHERE p.name = q.name AND p.staff_no < q.staff_no  (NB why ?)

42 © Ron Rogerson Slide 42 SQL - a practical implementation of relational theory (6) Outer joins SELECT student.student_id, name, phone_no FROM student LEFT OUTER JOIN telephone ON student.student_id = telephone.student_id (NB can use RIGHT OUTER, FULL OUTER) Natural joins SELECT student.student_id, name, phone_no FROM student NATURAL JOIN telephone GROUP BY SELECT product, COUNT(country), SUM(quantity) FROM production GROUP BY product HAVING  (is to groups what WHERE is to rows)  SELECT product, COUNT(country), SUM(quantity) FROM production GROUP BY product HAVING SUM(quantity) > 15000

43 © Ron Rogerson Slide 43 SQL - a practical implementation of relational theory (7) ORDER BY SELECT PRICEDATE, PRICE FROM LUXPRICE WHERE LUXCODE = AND CURRENCY = 'US' ORDER BY PRICEDATE DESC  (NB ASC is the default) QUERY SEQUENCE Statement Order SELECT... FROM... (WHERE...) (GROUP BY...) (HAVING...) (ORDER BY...) Logical Processing Model FROM... (WHERE...) (GROUP BY...) (HAVING...) SELECT... (ORDER BY...) n

44 © Ron Rogerson Slide 44 SQL - a practical implementation of relational theory (8) COMPOSITE QUERIES  Note that SQL has no equivalent of joining queries using 'alias' in the Relational Algebra. Most complex queries can be handled by the following methods. UNION  SELECT country, yr, population FROM population WHERE country IN (‘Spain’,’Ireland’) UNION SELECT name, 1990, population FROM country WHERE name in (‘Spain’, ‘Ireland’) SUBQUERIES('nested' queries)  Generally, a subquery is a query the result of which will be a single column. It is enclosed in brackets so that it can become part of the predicate of another query.

45 © Ron Rogerson Slide 45 SQL - a practical implementation of relational theory (9) Subqueries (cont’d) Where its output will be more than 1 row, it must be used with the quantifiers ALL or ANY, (or the comparison operator IN): SELECT name FROM student WHERE registered <= ALL (SELECT DISTINCT registered FROM student) Where its output will be exactly one row, it can be used thus: SELECT country FROM production WHERE product = ‘Oats’ AND quantity < (SELECT AVG (quantity) FROM production WHERE product = ‘Oats’) Joins v. subqueries JOIN if output needs data from both tables SUBQUERY if comparison with aggregate function on 2nd table else can use either n

46 © Ron Rogerson Slide 46 SQL - a practical implementation of relational theory (10) SUBQUERIES (cont'd) In the logical processing model, a normal subquery is processed first. Correlated Subqueries a subquery that refers to the value of a column in the “current row” of the outer query (an “outer reference”). SELECT country, yr FROM population p WHERE population > (SELECT 0.2*SUM(q.population) FROM population q WHERE q.yr = p.yr) a Correlated Subquery is processed once completely for every row of the “outer” query

47 © Ron Rogerson Slide 47 SQL - a practical implementation of relational theory (11) DATA DEFINITION CREATE TABLE small_country (name CHAR(16), gdp DECIMAL(4,1), cars INTEGER, population DECIMAL(6,1), PRIMARY KEY (name)) cars INTEGER NOT NULL DEFAULT 0 ALTER TABLE small_country ADD area INTEGER ALTER TABLE small_country DELETE population ALTER TABLE small_country MODIFY cars DEFAULT 0 ALTER TABLE small_country ALTER cars DROP DEFAULT DROP TABLE small_country

48 © Ron Rogerson Slide 48 SQL - a practical implementation of relational theory (12) Constraints PRIMARY KEY NOT NULL UNIQUE: population DECIMAL(6,1) UNIQUE, or ALTER TABLE small_country ADD UNIQUE population REFERENTIAL: counsellor_no CHAR(4) NOT NULL REFERENCES staff {staff_no} {ON DELETE (RESTRICT or SET DEFAULT or CASCADE)}, or FOREIGN KEY (counsellor_no) REFERENCES staff {staff_no} {ON DELETE.. } CHECK: r egistered SMALLINT CHECK (registered between 1988 and 2010) CHECK (region = (SELECT region FROM staff WHERE counsellor_no = staff_no)) DOMAIN: CREATE DOMAIN credit_points AS SMALLINT NOT NULL DEFAULT 60 CHECK (VALUE IN (30, 60))

49 © Ron Rogerson Slide 49 SQL - a practical implementation of relational theory (13) VIEWS CREATE VIEW counselling2(s_name, s_no, region, c_name, c_no) AS SELECT s.name, student_id, s.region, c.name, counsellor_no FROM student s, staff c WHERE counsellor_no=staff_no DROP VIEW counselling2 UPDATING  DELETE  DELETE FROM small_country WHERE name = "Yugoslavia"  INSERT  INSERT INTO small_country {column_names}  VALUES ('Slovenia', NULL, 157, )  (can specify columns to be filled as an alternative to putting NULLs in the VALUES clause)

50 © Ron Rogerson Slide 50 SQL - a practical implementation of relational theory (14) UPDATING (cont’d) UPDATE UPDATE small_country SET gdp = 17.3 WHERE name = 'Slovenia' INSERT INTO dba.staff VALUES ('8086', 'Pratchett', 1) UPDATING VIEWS Can be updated, if, in the definition: SELECT includes only column names (no value expressions) and no DISTINCT operator; FROM only references one table; WHERE does not include a subquery; no GROUP BY and no HAVING;

51 © Ron Rogerson Slide 51 SQL - a practical implementation of relational theory (15) Access Control GRANT SELECT {DELETE, INSERT, UPDATE, REFERENCES} ON staff TO admin, faculty GRANT ALL PRIVILEGES ON mod_staff TO admin {with grant option} GRANT UPDATE (name) ON mod_staff TO faculty Restructuring: planning Main priciple: ensure data is not lost e.g., CREATE temp. table with old structure INSERT.. SELECT old data into it DROP old table CREATE table, new structure, old name INSERT.. SELECT old data into it DROP temporary table or ALTER table to add replacement column UPDATE table to copy data from old column to new DROP old column

52 © Ron Rogerson Slide 52 Bringing it together - developing a relational d/b with SQL Steps: establishing requirements data analysis database design implementation Desirable properties of a model completeness integrity flexibility efficiency usability Modelling constructs entity types relationships attributes identifiers complex values (separate entities/multiple attributes) entities or attributes ? derived data entity subtypes

53 © Ron Rogerson Slide 53 Bringing it together - developing a relational d/b with SQL (2) Constraints inclusive exclusive Developing a conceptual data model establishing requirements possible ambiguities the model: “formal representation of what a d/b should contain, independent of how it should be realised” should: represent all users’ requirements; have no duplication; include all constraints; be general; be understandable. Data analysis to produce it establish scope of model text analysis: list nouns as potential entity types discard those which: are outside scope occur only once are synonyms are attributes relate to implementation details list verbs as potential relationships re-scan to find constraints

54 © Ron Rogerson Slide 54 Bringing it together - developing a relational d/b with SQL (3) Data analysis (cont'd) Document analysis assuming a document represents an entity: what does each occurrence represent? what are properties / facts about entity type? for each property, is it single- or multi- valued? optional or mandatory? derivable? temporal? Produce initial E-R model add participation, constraints & assumptions eliminate redundancy, resolve m:n, examine complex data, remove derived data, consider subtypes check: 'read' model to try and reconsitute the requirement check requirement to see if data & relationships are correctly represented

55 © Ron Rogerson Slide 55 Bringing it together - developing a relational d/b with SQL (4) Database design The final system may not be a direct implementation of the model! Choices in directly representing model: posted key or relation-for-relationship alternative constraint methods representing complex data representing entity sub-types in implementation: defining columns Numeric data types - range/precision Character data types - length Operations required - restricted by number and type of columns chosen Data/time data types Not null constraints Default values - essential with a “not null” constraint - meaningful and distinguishable defining keys surrogate primary keys foreign keys - “on delete” action omit or relax constraints? de-normalise?

56 © Ron Rogerson Slide 56 Distributed Data Client / multi-server designed to store data where it is mostly used, but permit remote access multiple independent databases user process must explicitly navigate them (connection management) data may be divided by function, or users, etc. transaction management done in “two-phase” commit Distributed databases meant to store data where it is mostly used and permit remote access or to provide resilience appears to user processes as a single d/b (location independent) distribution schema identifies physical locations of data in the logical schema data may be fragmented (horizontally or vertically) according to usage, or replicated for resilience optimisation of queries requires knowledge of where the data items (or nearest copy of them) are available transaction support (and consistency of multiple copies) an added problem but handled by dbms

57 © Ron Rogerson Slide 57 Distributed Data (2) Replication systems Designed to avoid remote accesses by storing multiple copies locally Improves response and availability but produces consistency issues May not aim for real-time consistency Consolidation approach - primary sources for different items are in different places, collect these fragments to produce global view Dissemination approach - start with single primary copy and distribute copies, but may allow real-time update of central and local copies together

58 © Ron Rogerson Slide 58 Data Warehouses (NB: do Data Warehouses involve any new kind of technology, in the same way as distributed d/bs, or row clustering?) Decision support systems cf. “management information” systems typically ask non-”right now” questions may require data from diverse systems, or discarded in normal operations (or not otherwise captured?) Characteristics of data warehouse Subject-oriented Non-volatile integrated time variant Dimensional analysis aims to determine subject area of interest, and important dimensions of analysis. “star schema” element of guesswork in fixing dimensions (is it a data-centred or application-centred system?) sales member timearea wine

59 © Ron Rogerson Slide 59 Data warehouses (2) Building a data warehouse extraction component produces warehouse data from existing systems first define how to identify the “fact” in question may convert from stored data, or extract as it is added (e.g. by a trigger) integration component format integration semantic integration the database fact table is centre of star has n:1 relationships with dimension tables Aggregates fact table may become enormous so queries need huge processing power in practice, queries tend to want summary or aggregate data create aggregate tables at various levels can query one level and drill down of drill up as necessary to follow up trends discovered aggregate navigator may help to take advantage of the various levels

60 © Ron Rogerson Slide 60 XML and databases XML – a formal markup language for documents principally for layout & presentation specific XML definitions can be made for specific applications XML and relational data RelationalXML Atomic values in table structure with unique names Nested elements in tree structure with named root element Columns have unique names, ordering not significant, values all same type Elements have unique names, can contain data or other elements, schema can determine type Rows are distinct, ordering not significant Elements distinguished by location, specified as a path Access to data by table operations, no concept of location Access to data by location in tree Atomic values in table structure with unique names Nested elements in tree structure with named root element Relations are logical structures, no direct storage implications XML is logical structure with specified storage representation

61 © Ron Rogerson Slide 61 XML and databases (2) Transforming relational data into XML export data by representing table/row structure by XML tags, or use SQL/XML query to create XML document for specific application Storing XML data in relational d/b “shred” document by reducing elements to simple values for table structure – XMLTABLE function, or store entire XML as CHAR data value Querying XML values in r/d/b use XMLTABLE function in SQL query to return table-like values, or use XMLQUERY function to return XML values

62 © Ron Rogerson Slide 62 Application Development Embedded SQL Direct (non-cursor) statements only where =<1 row will be transferred EXEC SQL SELECT name, registered, region INTO :StudentName, :YearRegistered, :Region FROM student WHERE student_id= :SelectId; Use also with INSERT, DELETE, etc. Cursor statements EXEC SQL DECLARE regional_student CURSOR FOR {SQL query specification}; EXEC SQL OPEN regional_student; EXEC SQL FETCH regional_student INTO :StudentId, :StudentName, :YearRegistered, :Region; EXEC SQL CLOSE regional_student;

63 © Ron Rogerson Slide 63 Application Development (2) ODBC Provides interface between applications and rDBMS Applications can devise own methods of handling returned data Applications can be DBMS-independent Can handle dynamic SQL, enabling separate front-end tools to access other vendors' DBMS Connection to DBMS provided by DBMS-specific ODBC driver JDBC Provides ODBC-like interface between Java applications and rDBMS Provides its own automatic cursor-like method of handling multiple rows Connection to DBMS provided by DBMS-specific JDBC driver SQLJ Embedded SQL for Java programs Iterator provides cursor-like functionality

64 © Ron Rogerson Slide 64 Application Development (3) D/b routines using Java dbms implementation includes Java virtual machine internal 'SQL' routine can use Java directly Object-relational mapping mapping tools require definition to map each d/b to an application d/b can then be accessed from Java program without knowledge of SQL or d/b structure Scripting languages e.g. Python, PERL interpretive, easily changed often used to facilitate browser access to a d/b requires DBMS-specific DB-API language has to provide own cursor-like functionality


Download ppt "© Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson"

Similar presentations


Ads by Google