Distributed Database Design & Semantic Data Control

Slides:



Advertisements
Similar presentations
Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.
Advertisements

Distributed DBMSPage 6. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database Design.
Distributed Query Processing –An Overview
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed Database Systems Dr. Mohamed Osman Hegazi.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Introduction to Database Management  Department of Computer Science Northern Illinois University January 2001.
The Relational Model System Development Life Cycle Normalisation
Chapter 3 The Relational Model Transparencies © Pearson Education Limited 1995, 2005.
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
Institut für Scientific Computing – Universität WienP.Brezany Fragmentation Univ.-Prof. Dr. Peter Brezany Institut für Scientific Computing Universität.
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
1 Minggu 2, Pertemuan 3 The Relational Model Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Reference Book Principles of Distributed Database System Chapters 4. Distributed DBMS Architecture 5. Distributed Database Design 7.5 Layers of Query Processing.
Distributed Databases
Dr. Kalpakis CMSC 461, Database Management Systems Introduction.
DISTRIBUTED DBMS ARCHITECTURE
CSC271 Database Systems Lecture # 6. Summary: Previous Lecture  Relational model terminology  Mathematical relations  Database relations  Properties.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
Chapter 4 The Relational Model Pearson Education © 2014.
Chapter 4 The Relational Model.
Chapter 3 The Relational Model Transparencies Last Updated: Pebruari 2011 By M. Arief
Database Design – Lecture 16
RAJIKA TANDON DATABASES CSE 781 – Database Management Systems Instructor: Dr. A. Goel.
DISTRIBUTED DATABASE DESIGN
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
CODD’s 12 RULES OF RELATIONAL DATABASE
Session-9 Data Management for Decision Support
Chapter 3 The Relational Model. 2 Chapter 3 - Objectives u Terminology of relational model. u How tables are used to represent data. u Connection between.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Session-8 Data Management for Decision Support
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
1 The Relational Database Model. 2 Learning Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical.
9/7/2012ISC329 Isabelle Bichindaritz1 The Relational Database Model.
Distributed Database Systems Overview
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Relational Database. Database Management System (DBMS)
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
DDBMS Distributed Database Management Systems Fragmentation
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:
CIS/SUSL1 Fundamentals of DBMS S.V. Priyan Head/Department of Computing & Information Systems.
Design Process - Where are we?
CSE314 Database Systems Lecture 3 The Relational Data Model and Relational Database Constraints Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
The Relational Model. 2 Relational Model Terminology u A relation is a table with columns and rows. –Only applies to logical structure of the database,
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
The Relational Model © Pearson Education Limited 1995, 2005 Bayu Adhi Tama, M.T.I.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
Chapter 4 The Relational Model Pearson Education © 2009.
Distributed DBMS© 2001 M. Tamer Özsu & Patrick Valduriez Page 1.1 Outline n Introduction Background Distributed DBMS Architecture Distributed Database.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Database Management.
DATA ACCESS CONTROL, MANAGEMENT DATA AND SECURITY (CIB125) PERTEMUAN 6
Chapter 2 Database Environment.
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 The Relational Model Pearson Education © 2009.
Outline Introduction Background Distributed DBMS Architecture
The Relational Model Transparencies
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 The Relational Model Pearson Education © 2009.
Distributed Database Management Systems
Outline Introduction Background Distributed DBMS Architecture
Presentation transcript:

Distributed Database Design & Semantic Data Control Univ.-Prof. Dr. Peter Brezany Institut für Scientific Computing Universität Wien Tel. 4277 39425 Sprechstunde: Di, 13.00-14.00 LV-Portal: www.par.univie.ac.at/~brezany/teach/gckfk/06ss/300658.html

Distributed Database Design

A General Framework Considered in the Distributed Database Design Problem The organization of distributed systems can be investigated along 3 dimensions: Level of Sharing (a) no sharing: each application and its data execute at one site, and there is no communication with any other program or access to any data file at other sites (not very common today) (b) data sharing: all the programs are replicated at all the sites, but data files are not (c) data-plus-program sharing: a program at a given site can request a service from another program at a se- cond site, which, in turn, may have to access a data file located at a 3. site.

A General Framework Considered in the Distributed Database Design Problem (2) 2. Access patterns (along this dimension, the relationship between distributed database design and query processing is established) (a) static (they do not change over time) – the design is easier for static environments; unfortunately, it is difficult to find many real-life applications that would be classified as static. (b) dynamic 3. Level of knowledge about the access pattern (AP) behavior (a) no knowledge (a theoretical possibility) it is very difficult to design a distr. database that can effectively cope with this situation (b) complete knowledge – AP can be re- sonably (almost precicely) predicted (c) partial information – there are deviations from the predictions

Alternative Design Strategies Top-Down approach Bottom-Up approach In most database designs, the two approaches may need to be applied to complement one another. It is a joint function of the database, enterprise, and application system administrators (or of the administrator performing all three roles)

Top-Down Design Process

Bottom-Up Design Process Top-down design is a suitable approach when a database system is being designed from scratch. Commonly, a number of databases already exist, and the design task involves integrating them into one database.  the bottom-up approach is suited for this type of environment. The starting point is individual local conceptual schemas. The process consists of integrating local schemas into the global conceptual schemas. This type of environment exists primarily in the context of heterogeneous databases.

Distribution Design Issues - Fragmentation Why fragment at all? How should we fragment? How much should we fragment? Is there any way to test the correctness of decomposition? How should we allocate? What is necessary information for fragmentation and allocation?

Reasons for Fragmentation The important issue is the appropriate unit of distribution: a relation is not a suitable unit – why? Application views are usually subsets of relations, therefore the locality of accesses of applications is defined not on entire relations but on their subsets  it is only natural to consider subsets of relations as distribution units. If the applications that have views defined on a given relation reside at different sites, two alternatives can be followed, with the entire relation being the unit of distribution: The relation is not replicated  this results into in an unnecessarily high volume of remote data accesses. The relation is replicated at all or some of the sites where the applications reside  problems in executing updates and may not be desirable if storage is limited. The decomposition of a relation into fragments, each being treated as a unit, permits a large number of transactions to execute concurrently and intraquery concurrency is enabled. Disadvantages of fragmentation Applications whose views are defined on more than one fragment may suffer performance degradation. Integrity checking (semantic data control): Attributes participating in a dependency may be decomposed into different fragments which might be allocated to different sites.

Fragmentation Alternatives: Horizontal and Vertical Fragmentation Example Database

Example of Horizontal Fragmentation

Example of Vertical Fragmentation Remark: The fragmentation may, of course, be nested. If the nestings are of different types, one gets hybrid fragmentation.

Hybrid (Mixed) Fragmentation Relation A Fragment H1 Fragment H2V1H1 Fragment H2V2 Fragment H2V1H2 The abstract relation A in this example has been fragmented three times (not five!) – twice horizontally and once vertically. The first horizontal fragmentation yielded two fragments, H1 and H2. The second fragmentation was a vertical partitioning of horizontal fragment H2, again yielding two fragments (although, of course, vertical ones this time) H2V1 and H2V2. The third and last fragmentation was another horizontal one, yielding three fragments of the first vertical fragment, H2V1. Note that this diagram is a logical one. It is not intended to suggest that all of the tuples stored at the “top” of the relation (whatever that might be taken to mean) belong in fragment H1, and so on! Fragment H2V1H3

Correctness Rules of Fragmentation Completness. If a relation R is decomposed into fragments R1, R2, ..., Rn, each data item that can be found in R can also be found in one or more of Ri´s. Reconstruction. If a relation R is decomposed into a set fragments FR = {R1, R2, ..., Rn}, it should be possible to define a relational operator  such that R = Ri, Ri  FR The operator  will be different for for the different forms of fragmentation. The reconstructability of the relation from its fragments ensures that constraints defined on the data in the form of dependencies are preserved. Disjointness. If a relation R is horizontally decomposed into fragments R1, R2, ..., Rn and data item di is in Rj, it is not in any other fragment Rk (k  j).If relation R is vertically decomposed, its primary key attributes are typically repeated in all its fragments  disjointness is defined only on the nonprimary key attributes of a relation.

Was ist die „optimale“ Zuordnung von F zu S bzgl. Q ? Allokation Sei F = {F1, ..., Fn} eine Menge von Fragmenten, S = {S1, ..., Sm} ein Netzwerk gegeben durch die Menge seiner Sites, und Q = {Q1, ..., Qp} die Menge der relevanten Anwendungen. Allokationsproblem: Was ist die „optimale“ Zuordnung von F zu S bzgl. Q ? Optimaltätskriterium: Minimalität der Kosten gegeben durch die Speicherkosten der Fi an den Sites Sj, der Anfragekosten für Fi an Site Sj, der Änderungskosten der Fi an allen Sites an den sie gespeichert sind, und die Kosten der Datenkommunikation. Performanz im Sinne von Antwortzeiten oder Systemdurchsatz.

Fragmente und ihre Allokation Sites R1 R1,1 S1 R2 R R2,1 R1,2 R3 R4 Globale Relation S2 R2,2 Fragmente Fragmente und ihre Allokation Allokation an den Sites R3,3 S3 R4,3

Zusätzliche Beispiele horizontale Fragmentierung

Beispiel 2: abgeleitete horizontale Fragmentierung

Beispiel 3: vertikale Fragmentierung

Semantic Data Control

Main Objectives View management Security control Semantic integrity control. The above items can be defined by rules that the system automatically enforces. The violation of some such rule by a user program (a set of database operations) generally implies the rejection of the effects of that program. The rules are defined by the DB administration. The cost of enforcing semantic data control, which is high in terms of resource utilization in a centralized DBMS, can be prohibitive in a distributed environment. The rules must be stored in a catalog (directory)  efficient management in a distributed environment is important

View Management In a rel. DB system a view is a virtual relation, defined as the result of a query on base relations (or real relations), but not materialized like a base relation, which is stored in the DB. An external schema can be defined as a set of views and/or base relations. Besides their use in external schemas, views are useful for ensuring data security in a simple way  if users may only access the database through views, they cannot see or manipulate the hidden data, which are therefore secure. In a distributed DBMS, a view can be derived from distributed relations, and the access to a view requires the execution of the distributed query corresponding to the view definition. An important issue in a distributed DBMS is to make view materialization efficient.

Views in Centralized DBMSs Example 1 The view of system analysts (SYSAN) derived from relation EMP (ENO,ENAME,TITLE), can be defined by the following SQL query: CREATE VIEW SYSAN(ENO, ENAME) AS SELECT ENO, ENAME FROM EMP WHERE TITLE = ``Syst. Anal.´´ The single effect is the storage of the view defintion in the catalog. No other information needs not be recorded. Therefore, the result to the query defining the view (i.e., a relation having the attributes ENO and ENAME for the system analysts as shown in the figure below) is not produced. However, the view SYSAN can be manipulated as a base relation. ENO ENAME E2 M.Smith E5 B. Casey E8 J.Jones

Views in Centralized DBMSs (cont.) Example 2 The query „Find the names of all the system analysts with their project number and responsibility(ies)“ involving the view SYSAN and relation ASG(ENO,PNO,RESP,DUR) can be expressed as SELECT ENAME, PNO, RESP FROM SYSAN, ASG WHERE SYSAN.ENO = ASG.ENO Mapping a query expresed on views into a query expressed on base relations can be done by query modification at compile time. The preceding can be modified to FROM EMP, ASG WHERE EMP.ENO = ASG.ENO AND TITLE = ``Syst. Anal.´´ ENAME PNO RESP M.Smith P1 Analyst M.Smith P2 Analyst B.Casey P3 Manager J.Jones P4 Manager

Views in Distributed DBMSs The definition of a view is similar as in centralized systems – distinction, a view may be derived from fragmented relations stored at different sites. View definitions can be centralized at one site, partially duplicate, or fully duplicated. Processing a query expressed on views: distributed query processor. Views derived from distributed relations may be costly to evaluate. Since in a given organization it is likely that many users access the same views, some proposals have been made to optimize view derivation. An alternative solution is to avoid view derivation by maintaining actual versions of the views – snapshots. A snapshot represents a particular state of the database and is therefore static, meaning that it does not reflect updates to base relations.

Data Security Data security includes 2 aspects: Data protection. It is required to prevent unauthorized users from understanding the physical content of data. Authorization control. It must guarantee that only authorized users perform operations they are allowed to perform on DB. Checking whether a given triple (user, operation, object) can be allowed to proceed. The introduction of a user: (user name, password). The user name uniquely identifies the users of that name in the system, while the password, known only to the users of that name, authenticates the users. Distributed authorization control Additional complexity: objects and subjects are distributed remote user authentication management of distributed authorization rules handling of views and of user groups (DB administration is simplified)

Semantic Integrity Control How to guarantee database consistency? A DB state is said to be consistent if the DB satisfies a set of constraints, called semantic integrity constraints. Mantaining a consistent DB requires various mechanisms: concurrency control, reliability, protection, and semantic integrity control (SIC). SIC ensures DB consistency by rejecting update programs which lead to inconsistent DB states, or by activating specific actions on the DB state, which compensate for the effects of the update programs. SI constraints are rules that represent the knowledge about the properties of an application. Structural constraints (e.g. unique key constraints in the rel. model) Behavioral constraints regulate the application behavior.

Centralized Semantic Integrity Control The language for expression and manipulating integrity assertions. Examples: Employee number in relation EMP cannot be null: ENO NOT NULL in EMP The pair (ENO,PNO) is the unique key in relation ASG: (ENO,PNO) UNIQUE IN ASG Enforcement mechanism that performs specific actions to enforce DB integrity and updates. Examples: The query for increasing the budget of the CAD/CAM project by 10%, which would be specified as UPDATE PROJ SET BUDGET = BUDGET * 1.1 WHERE PNAME = ``CAD/CAM´´ will be transformed into the following query in order to enforce the domain constraint: CHECK ON PROJ (BUDGET >= 500000 AND BUDGET <= 1000000) WHERE PNAME = ``CAD/CAM´´ AND NEW.BUDGET >= 500000 AND NEW.BUDGET <= 1000000)

Distributed Semantic Integrity Control Individual assertions. They refer only to tuples to be updated independently of the rest of the database. Example: CHECK ON PROJ (BUDGET >= 500000 AND BUDGET <= 1000000). The assertion definition is sent to all other sites that contain fragments of the relation involved in the assertion. The assertion must be compatible with the relation data at each site. Compatibility can be checked at two levels: predicate and data. First, predicate compatibility is verified by comparing the assertion predicate with the fragment predicate. An asertion C is not compatible with a fragment predicate p if „C is true“ implies that „p is false,“ and is compatible with p otherwise. If noncompatibility is found at one of the sites, the assertion definition is globally rejected because tuples of that fragment do not satisfy the integration constraints. Second, if predicate compatibility has been found, the assertion is tested against the instance of the fragment. If it is not satisfied by that instance, the assertion is also globally rejected. If compatibility is found, the assertion is stored at each site.

Distributed Individual Assertions: Example Consider relation EMP, horizontally fragmented across three sites using the predicates p1 : 0 <= ENO < „E3“ p2: „E3“ <= ENO <= „E6“ p3: ENO > „E6“ and the domain assertion C: ENO < „E4“. Assertion is compatible with p1 (if C is true, p1 is true) and p2 (if C is true, p2 is not necessarily false), but is not with p3 (if C is true, then p3 is false). Therefore, assertion C should be globally rejected because the tuples at site 3 cannot satisfy C, and thus relation EMP does not satisfy C.

Set-Oriented Assertions They include single-relation multivariable constraints such as functional dependency (Example 1) and multirelation multi-variable constraints such as foreign key constraints (Example 2) Example 1: The employee number functionally determines the employee name. ENO IN EMP DETERMINES ENAME Example 2: The project number PNO in relation ASG is a foreign key matching the primary key PNO of relation PROJ. In other words, a project referred to in relation ASG must exist in relation PROJ. PNO IN ASG REFERENCES PNO IN PROJ

Assertions Involving Aggregates Example: The total duration for all employees in the CAD/CAM project is less than 100. CHECK ON g:ASG, j:PROJ (SUM(g.DUR WHERE g.PNO=j.PNO) < 100 IF j.PNAME=``CAD/CAM´´) These assertions are among the most costly to test because they require the calculations of the aggregate functions.