Presentation is loading. Please wait.

Presentation is loading. Please wait.

Objectives of the Lecture :

Similar presentations


Presentation on theme: "Objectives of the Lecture :"— Presentation transcript:

1 Objectives of the Lecture :
Data Independence Objectives of the Lecture : To consider the problems solved by a DBMS; To consider how a DBMS uses Data Independence to solve the problems; To consider the nature of Logical & Physical Data Independence; To consider View relations.

2 What Problem Does a DBMS Solve ?
Q2 Q1 Q3 Q4 P1 P2 P3 R1 R3 R2 D’ E’ F’ M R’ S’ T’ N A B C The above caricature depicts 3 applications for different departments of an organisation, each application having its own programs and data files. Assume each application is “perfectly designed” to make the most of the situation existing at the time of its creation. The ‘left-hand’ and ‘right-hand’ applications are independent applications; i.e. do not interact with each other in any way. The ‘left-hand’ application has 2 data files of its own, one containing data values A, B and C, the other data values D, E, F, G, H and J. (Assume these values are like attributes in a relation). Program P1 inputs data into a file, while programs P2 and P3 input, output and process data. The ‘right-hand’ application is analogous. The ‘centre’ application is assumed to be built after the other two. Most of the data it needs (all except data values M and N) are already stored by the other two applications, but not in a way that is suitable for it. Therefore its programs Q1 and Q4 read in the required data, process it, and store it as required (together with M and N) in its own file. Programs Q2 and Q3 extract data from the file and process it. D E F G H J W X Y Z R S T U V Problem of Shared Data !

3 Initial Problems Duplicate Data
Data common to 2 or more applications is duplicated in each application’s files. Each version of the data may be physically stored differently : different data types (e.g. integer vs. floating point) &/or organisations (e.g. differently structured records) &/or access methods (e.g. one reached by hashing, the other by an index). Applications Constrained by Existing Data/Applications Where data already exists, newer applications are constrained to use it, to minimise data input and storage. Newer applications are handicapped by data entry timings and methods, storage structures and organisations that are unsuitable for the new application. The 2 problems are at opposite ends of the spectrum. Data duplication is at its worst if we try to avoid each application being handicapped by others. (We do not need to understand the different ways of physically storing data – they are beyond the scope of this course - merely know that they exist and are important). Constraints on applications are at their worst if we try to minimise duplicate data. In practice some compromise between the two extremes is necessary. Nevertheless in practice we often get the worst of both worlds ! In the example from the previous slide, the ‘centre’ application suffers from both problems, in that it has to generate new copies of much of its data from the files of pre-existing applications in order to suit its own application program requirements. Therefore not only is data replicated, but the newer ‘centre’ application is constrained by the pre-existing ‘left’ and ‘right’ applications.

4 Even More Problems ! Duplicate data leads to Inconsistent Data, or Updating Overheads. If the updating of all copies is not synchronised, they will become inconsistent. Applications using inconsistent data will cause chaos. On-line transactions must update multiple updates simultaneously; batch update runs must be highly integrated. Constraining applications leads to Reduced Performance and/or Excessive Maintenance. Although each application may be well planned, the overall data storage situation will become complex and ill thought-out because it is unplanned. To simplify it and make it efficient will require extra maintenance to re-configure existing application data storage every time a new application is added. Updating multiple files for a transaction would lead to serious performance overheads. Integrating the file handling of independent applications is a contradiction in terms. Because each application will be added to meet business needs over a long time period, it is not possible to foresee exactly how data file storage will grow over the longer period and plan for it in an integrated fashion. To re-design the file storage of multiple applications is not only technically difficult, there is also the problem of testing the new file storage system and bringing it into operation without a serious risk to the organisation; this is especially so if the applications are running most of the time, particularly if they are the administrative life blood of the organisation. It is the norm rather than the exception for data to be shared by different business applications. For example, data that records the storage of goods in a warehouse will be used not only by warehouse management, but may also be used by the logistics, buying, sales and accounting departments.

5 Database Management System A B C D E F G H J M N R S T U V W X Y Z
Problems Solved ! Q2 Q3 P1 P2 P3 R1 R3 R2 The above caricature is an amended version of the earlier one, because the same individual applications have their individual sets of data files replaced by one DB that serves all 3 applications. Assume the ‘left-hand’ and ‘right-hand’ independent applications still use their same programs to input data into the DB (instead of their own data files), and process it to meet their own individual application requirements. Now the ‘centre’ application no longer needs its programs Q1 and Q4 to obtain its data in a way that is suitable for it. Neither are there any longer duplicate copies of data. All the application programs are simpler because they no longer have to deal with file handling. They simply request the data from the DBMS, which provides them with it in the form that they require. Database Management System A B C D E F G H J M N R S T U V W X Y Z

6 Benefits of a DBMS Data duplication is eliminated.
Maintenance of the data storage is significantly reduced. Physical storage can be optimised for overall performance, and easily altered to maintain performance with altering any applications. Applications are simpler because they no longer deal with data storage, which is handed over to the DBMS. Each application can obtain its data in a form that is optimised to meet its requirements. Maintenance of the applications is reduced in that it is simple to meet their revised data needs if they are altered. Data storage maintenance is reduced because it no longer has to be contorted to cope with the structure that has historically evolved, or undergo major design changes whenever a new application(s) is/are added. The queries that we have already seen can be used to provide an application with any form of data it requires. Since queries are easy to write, if an application changes and needs different data, we just quickly change the queries used to provide it with that data.

7 How are the Benefits Provided ?
The DBMS provides : Data Independence. The DBMS acts a layer of insulation between application programs and data. An application requests whatever data it needs, and the DBMS provides it. The DBMS can store data with a variety of different storage organisations & methods, and use the best one(s) for its applications. One integrated & coherent pool of data, i.e. a DB. Since all the data is separated from applications, it can be viewed together and designed according to its inherent meaning & structure. Bonus is the ability to ask ad hoc queries of the DB. Possible as all the data can now be made visible in a coherent structure. Queries and updates can be done directly on the DB without an intervening application. (The user interface is actually an application program). Data Independence is a technical facility that is, or should be, provided by the DBMS. We get whatever a given DBMS can provide. SQL generally provides good data independence for retrievals. Queries are written without any regard as to how the DBMS actually gets the data from its files. In fact SQL is designed to make it impossible to require knowledge of the data’s physical storage. Similarly for updates, although SQL DBMSs are not always as good as they could be. The DBMS will provide a portfolio of physical storage options, each with different performance characteristics. The DB designer chooses the most suitable one(s) for the usage of the DB in question, but can comparatively easily change storage options if the DB usage alters and requires different performance characteristics; and all without altering the SQL queries and updates used to support applications. By comparison, the DB is an opportunity not a technical facility, in that the DBMS will treat it as a collection of data regardless, but it is up to the DB designer to design it as an integrated pool of meaningful data. If the DB design is poor, not all the benefits of the DB will be realised. Whereas ad hoc querying of DBs was historically not the original purpose of DBs, nowadays it is often the prime, or even the sole purpose of a DB. The structure of the DBMS is easily investigated via the user interface, and queries and updates easily written to exploit the DB.

8 Physical Data Independence
Definition : the ability to change the way data is physically stored in the computer, while leaving the logical structure of the DB - i.e. all the relations in it - unchanged. Physical storage consists of : physical record formats, which determine how a few data values (usually corresponding to a tuple) are stored; the arrangement of the physical records into physical files; the methods by which the physical data is accessed ; e.g. by reading sequentially through records, using an index, etc. Physical independence always allows the user to see data as relations, regardless of how relations are physically stored underneath. Physical storage can be altered, while the user still sees the same relation. Data Independence actually consists of Physical Independence and Logical Independence. Some physical storage arrangements have performance characteristics most suited for queries, others for updates, some for one type of query, others for a different type of query. The size of the relation stored also affects performance. DB administrators will choose the storage arrangements to give the DB the performance required for the way in which it is used. If the pattern of usage changes over time, the storage arrangements may have to be changed to meet the different usage needs. This is called Performance Tuning. SQL has a variety of statements available to tell the DBMS which physical storage arrangement should be used to store a relation. We have only seen the SQL commands to tell the DBMS what kind of logical relation should be created. This is because physical storage is beyond the scope of this course.

9 Logical Data Independence (1)
Definition : the ability to change the logical structure required by an application - i.e. all the relations it uses - without the computer having to change the logical structure of the DB - i.e. all the relations in the DB. It includes the logical equivalent of this, viz. the ability to change the logical structure of the DB - i.e. all the relations in it - while still providing any application with the same logical structure that it requires - i.e. all the relations it requires. To achieve this requires Views. Definition : a View is a new relation derived from pre-existing relations; it is the specification of the view that is stored, not the data that appears in a view. A view is a named query; e.g. an SQL query or an algebra expression to be evaluated & retrieved. So in SQL : Create View VIEW_NAME As ( some valid SQL query ) ; Three example views : A view showing details of all the Ford cars : Create View FORD As ( Select * From CAR Where Type Like ‘Ford%’ ) ; A view showing all employees’ details apart from the salary : Create View EMPLOYEE_LESS_SAL As ( Select EName, ENo, M-S From EMPLOYEE ) ; A view that shows, for all employees owning a car, the employee name and ID number and their car’s registration number : Create View CAR_OWNER As ( Select EName, ENo, RegNo From CAR Join EMPLOYEE On ( Owner = ENo ) ) ; Or Create View CAR_OWNER As ( Select EName, ENo, RegNo From CAR, EMPLOYEE Where Owner = ENo ) ;

10 Logical Data Independence (2)
View : EMPLOYEE_LESS_SAL Project View : PROJECT_EMPLOYEE View : CAR_OWNER Join Join Base : PROJECT Base : EMPLOYEE Base : CAR Given a set of relations that actually hold all the data stored in the DB - these are called Base Relations - then we can derive a set of any view relations (called simply Views for short) that we like from them, as long as they are logically derivable; i.e. in SQL, as long as we can define the views in SQL. We then let an application use any relations that it needs, regardless of whether those relations are base or views. Indeed there is no reason in principle why an application should know which of the relations it is using are base and which are views. We will create whatever views are needed by applications. Base relations comprise the complete set of data in the DB. A view can provide a subset of this data in a different structure or grouping. The view mechanism therefore provides a flexible means of presenting DB data to applications. In the example above, from the 3 base relations about projects, employees, and cars, we have created views about projects and the employees working on them, employee information less salary data (to preserve confidentiality), and car owners. One application might use the relations PROJECT, EMPLOYEE_LESS_SAL, and CAR, another the relations PROJECT_EMPLOYEE, OWNER_LESS_SAL, and CAR_OWNER, and so on. If the DB currently lacks a relation that an application needs, then we create it with a view. If the base relation EMPLOYEE needs to be changed, but continues to be needed by an application, we can still change it, but we create a new view equivalent to the old EMPLOYEE relation so that the application can continue unchanged. Choose a set of base and/or view relations that give an application the data it wants.

11 The ANSI/SPARC 3-Layer Architecture
Application Programs P1 P2 Q2 Q3 R1 Sub Schema Sub Schema Sub Schema Logical Schema Describe relations DBMS ANSI/SPARC was a DB committee of the American National Standards Institute. A Schema is a specification of a DB or part of a DB. In order to provide data independence, a DBMS implements the 3-layer architecture depicted above. The Logical Schema (sometimes called the Conceptual Schema) is a specification of all the base relations in a DB. In SQL it will consist of all the Create Table & Alter Table statements in a DB. It defines at the logical level the entire contents of the DB. The Physical Schema (sometimes called the Storage Schema or Internal Schema) is a specification of how all the base relations in a DB are physically stored. In SQL it will consist of all the statements used to specify the physical storage of the base tables. (Reminder : these statements are not covered in this course). Note that this schema is a definition of how the data is stored, it is not itself the storage of the data. The data is physically stored in operating system files. Each Sub Schema (sometimes called an External Schema) is the set of all the relations, base or view, that are visible to a particular application or set of applications. It is a way of bringing together all the data required by an application(s) and excluding all that is irrelevant to it/them. An application(s) is/ are not allowed access to any other part of the DB outside their Sub Schema. Physical Schema Describes files Operating System Files F1 F4 F2 F3

12 The Provision of Data Independence
Let an application program Q3 use a view, say EMPLOYEE_ LESS_SAL, which is a member of some Sub Schema. The view is mapped to its base relation, EMPLOYEE in this case, which is a member of the Logical Schema. The base relation is mapped to its storage specification, which is a member of the Physical Schema. When program Q3 wants to do something with EMPLOYEE_ LESS_SAL, it sends an instruction - a query or update written in SQL - to the DBMS. The DBMS follows the mappings though to the storage specification and determines what it must do with the actual stored data to accomplish this instruction. The DBMS carries out the action on which it has decided. From the result of the action, the DBMS uses the mappings in the reverse direction to generate what Q3 requires, and passes the result to Q3. Thus it is the mappings between schema layers that provide Data Independence. The mappings in the example above correspond to the mapping arrows in the previous slide. Mappings between a Sub Schema and the Logical Schema provide Logical Independence; mappings between the Logical Schema and the Physical Schema provide the Physical Independence. The mapping of a view in a Sub Schema to the base relation(s) in the Logical Schema consists of the query that was used to define the view in the Create View statement. (So when a base relation appears in a Sub Schema, it maps to itself in the Logical Schema, as it were, so the mapping is trivial). The mapping of a base relation in the Logical Schema to its storage specification in the Physical Schema consists of that same storage specification.

13 More about Schemas Logical Schema There is only one in a DB; so all the base relations automatically form it. Physical Schema There is only one in a DB; so all the storage specifications attached to base relations automatically form it. Sub Schemas SQL has no means of providing them directly. Instead it provides them indirectly by the use of : a Grant statement to give to certain DB users the privilege(s) of being able to carry out certain statements (e.g. Selects, Inserts) on certain view and/or base relations; a Revoke statement to remove from certain DB users the privilege(s) of being able to carry out certain statements (e.g. Selects, Inserts) on certain view and/or base relations. SQL does not directly use ANSI/SPARC schemas, but instead has other SQL-specific schemas. However schema still means “a specification of a (portion of a) DB”. SQL schemas tend to be tied up with implementation-dependent parts of SQL DBMSs, and so are beyond the scope of this course. By Granting some users the ability to see certain relations and Revoking that ability from all other users, we have simulated a Sub Schema containing those particular relations for those particular users. The only thing lacking is the ability to give names to Sub Schemas. Sub Schemas also provide security. If the user is only allowed to see and use the contents of a Sub Schema as opposed to the whole DB, then they are not only unable to do anything with the rest of the DB, they are unaware of it; and “What the eye doesn’t see, the heart doesn’t grieve over”. Note that Sub Schemas are occasionally called Views. To avoid confusion, it is recommended that you do not use this name for Sub Schemas.

14 Using Views If a user is to be able to use a view like a base relation, then they must be able to retrieve the data in a view and update a view. Retrieval Replace the view by its (query) definition, and evaluate that definition. If the view is a component of a query, then use this value for the component. Update When a view is updated, since only its definition not its value is stored, this requires that the underlying relations, i.e. those appearing in the view definition, are updated instead to create the effect. Unfortunately, SQL only implements a few of the logical possibilities. This means that often a view cannot be used as if it were a base relation. For retrievals, SQL will often use a more efficient way to do the query than literally meeting the logical requirement, but it will be logically equivalent. For updates, the only updates that cannot be done in principle are those that are logically impossible. For example, consider : Create View TOTAL( SUM_VALUE) As ( Select Sum( Distinct SAL ) From EMPLOYEE ) ; The view will consist of a single row containing a single column called SUM_VALUE that contains the sum total of every employee’s salary. (The view query treats the whole relation as one group). This view is not updateable in principle, because if we were to try to amend it, say by adding £1,000 to the sum total, we could not amend the underlying table EMPLOYEE because there is no way of knowing how the £1,000 is to be allocated across all the employees. In general, views containing calculations and aggregations like this are not updatable, but all others are. However in SQL, basically only views defined with the SQL equivalent of a Project and/or Restrict operation are updateable, although there are a few other limitations as well, e.g. the Select phrase may not include the keyword Distinct, and the Order By phrase may not appear in the Select statement.

15 Syntax of SQL Views The full syntax of an SQL view is :
Create View VIEW_NAME ( list of column names ) As Select statement With Check Option ; Optional Optional The ‘list of column names’ is only required if the default names arising as a result of the Select statement are not appropriate. The Select statement can be any legitimate query, although it must conform to the limitations if view updates are required. For updateable views, the ‘With Check Option’ option means updates on it that violate the integrity constraints are rejected; so use the option. (If it is not used, the underlying tables will be updated, but the results will not appear in the view !)

16 Views as Shorthands The normal method of using views, assumed so far, is to provide Logical Data Independence, where a view is used indistinguishably from a base relation if possible. Another use is to create views, additional to relations in the Sub Schema, in order to make it easier to write commonly occurring queries, or make complex queries easier to write. Example : Create View CAR_OWNER As ( Select EName, ENo, RegNo From CAR Join EMPLOYEE On ( Owner = ENo ) ) ; is created because there are many queries on car owners, and it saves work to have this view as a starting point for them. Here users know that these are views; they do not need updating.


Download ppt "Objectives of the Lecture :"

Similar presentations


Ads by Google