1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Enterprise Systems Distributed databases and systems - DT
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Chapter 25 Distributed Databases and Client-Server Architectures Copyright © 2004 Pearson Education, Inc.
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
Distributed Database Management Systems
Chapter 9 : Distributed Database.
Overview Distributed vs. decentralized Why distributed databases
Chapter 12 Distributed Database Management Systems
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed Databases
Distributed databases
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Distributed Databases
Distributed Database and Replication. Distributed Database A logically interrelated collection of shared data and a description of this data physically.
Distributed Databases and DBMSs: Concepts and Design
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
12 1 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Distributed DBMSs - Concepts and Design Transparencies
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
Distributed and mobile DBMSs Transparencies. ©Pearson Education 2009 Chapter 16 - Objectives Main concepts of distributed DBMSs (DDBMSs) Differences between.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Session-8 Data Management for Decision Support
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Distributed Database Systems Overview
DDBMS Distributed Database Management Systems Fragmentation
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Distributed Databases
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Distributed database system
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
Chapter 1 Database Access from Client Applications.
1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Databases and Client-Server Architectures
Chapter Name Replication and Mobile Databases Transparencies
Distributed DBMS Concepts of Distributed DBMS
Chapter 19: Distributed Databases
Distributed Databases and DBMSs: Concepts and Design
Presentation transcript:

1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter

2 Overview Last week: Saw difficulty in handling logical relationships between distributed information Potential solutions such as federated DDBMS This week: Look at an area where distributed databases are extensively used replication For backup for improving reliability of service such as for mirror site

3 Strategies for Data Allocation 1 Centralised Single database, users distributed across network High communication costs All data access by users over network No local references Low reliability and low availability Failure of central site leads to no access to entire database system Storage costs No duplication so minimal Performance Likely to be unsatisfactory

4 Strategies for Data Allocation 2 Fragmented Database distributed by fragments (disjoint views) Low communication costs Fragments located near their main users (if good design) Reliability and availability vary depending on failed site Failure of one part loses fragments situated there Other fragments continue to be available Storage costs No duplication so minimal Performance Likely to be satisfactory – better than centralised as less network traffic

5 Strategies for Data Allocation 3 Complete Replication Database completely copied to each site Communication costs: High for update, low for read Need to propagate updates through system High reliability and high availability Can switch from failed site to another High Storage costs Complete duplication Performance High for reads Potentially poor for updates with propagation of updates

6 Strategies for Data Allocation 4 Selective Replication Fragments are selectively replicated Communication costs: Low (if good design) Reliability and availability vary depending on failed site Failure of one part loses fragments situated there Other fragments continue to be available Storage costs Duplication of some fragments mean that it is not minimal but less than with complete replication Performance Likely to be satisfactory – better than centralised as less network traffic

7 Fragmentation -- Further Details A fragment is a view on a table. Two main types Horizontal (classification by value) subset of tuples obtained by restrict operation (algebra) or WHERE clause (SQL) Vertical (classification by property) subset of columns obtained by project operation (algebra) or SELECT clause (SQL)

8 Other Forms of Fragmentation Mixed (classification by both value and property) both horizontal and vertical fragmentation are used to obtain a single fragment Derived (association) an expression such as a join connects the fragments None The whole of a table appears without change in a view

9 Why fragment? Most applications use only part of the data in a table To minimise network traffic, do not send more data than is strictly necessary to any site Data not required by an application is not visible to it, enhancing security

10 Factors against fragmentation Performance may be affected adversely by the need for some applications to reconstruct fragments into larger units Integrity more difficult to control with dependencies possibly scattered across fragments

11 Three rules for fragmentation R1 R1) Completeness If a table T is decomposed into fragments every value found in T must be found in at least one of the fragments Otherwise get loss of data So no loss of data as a whole in fragmentation

12 Three rules for fragmentation R2 R2) Reconstruction It must be possible to reconstruct T from the fragments using a relational operation (typically a natural join) Otherwise decomposition into fragments is lossy Functional dependencies are preserved

13 Three rules for fragmentation R3 R3) Disjointness A data item may not appear in more than one fragment unless it is a component of a primary key Avoids duplication and potential inconsistency although transactions should avoid latter Primary key duplication allows reconstructions to be made

14 Strategy for Designing a Partially Replicated Distributed Database 1 Design global database using standard methodology Examine regional distribution of business. What data should be held by each part of business? Some data is only used locally (not exported, as in Federated DDBMS) Some data is mostly used locally

15 Strategy for Designing a Partially Replicated Distributed Database 2 Transactions give many clues as to ideal placement of fragments a transaction will perform slowly if it requires data from different sites, unless the network connecting them is very fast a transaction performing much replication of updates will perform slowly if there is frequent contention for resources (locking) frequently used transactions should be optimised; infrequently used ones can be ignored

16 Strategy for Designing a Partially Replicated Distributed Database 3 Decide on which relations are not to be fragmented. These will normally be replicated everywhere: as easy to update and to maintain integrity. Fragment remaining relations to suit: locality transactions

17 Transparencies in DDBMS Transparency hides details at lower levels (often implementation ones) from user Four main types: Distribution Transaction Performance DBMS

18 Distribution Transparency The DDB is perceived by the user as a single, logical unit even though the data is: distributed over several sites fragmented in various ways

19 Significance of Full Distribution Transparency User does not need to know anything about the distribution techniques User addresses global schema in queries User will, however, not understand why some queries take longer than others Highest form of distribution transparency is termed fragmentation transparency

20 Reduced forms of distribution transparency Location transparency user needs to know about fragmentation but not about placements at sites user does not need to know which replications exist Local mapping transparency the most limited transparency user needs to know about fragmentation and sites

21 Transaction Transparency Ensures that all transactions maintain the DDB’s integrity and consistency Each transaction is divided into subtransactions one subtransaction for each site usually execute subtransactions in parallel gains in efficiency More complicated than in centralised system

22 Forms of Transaction Transparency Concurrency Transparency all concurrent transactions (centralised and distributed) execute independently DDBMS must ensure that: each subtransaction is executed in the normal spirit of transactions (ACID) the subtransactions as a whole, forming one transaction, are executed ACID-style the mixture of subtransactions and whole transactions is executed ACID-style

23 Transactions -- problems with replication Failure Transparency Users are unaware of problems such as that below encountered during transaction execution If say 6 copies of a data item (at 6 sites) need to be updated: problems if only 5 are currently reachable need to delay COMMIT until all sites processed otherwise inconsistent data unless allow delayed asynchronous update

24 Performance transparency Requires: the DDBMS to determine the most cost- effective way to handle a request which fragment to use (if replicated) which copy of a fragment to use which site to use avoidance of any performance degradation compared with a centralised system

25 DBMS transparency Hides knowledge of which DBMS is being used The most difficult transparency of all particularly with heterogeneous models See problems highlighted in lecture 9: Global Schema Integration Federated Databases Multidatabase Languages

26 Replication Servers Copying and maintenance of data on multiple servers Replication -- the process of generating and reproducing multiple copies of data at one or more sites Servers – provides the file resources – the distributed database

27 Benefits of Replication Increased reliability Better data availability Potential for better performance (with good design) Warm stand-by As in mirror site, shadowing actions of main site and cutting in if main site crashes

28 Timing of Replication Synchronous Immediate according to some common signal such as time Ideal as ensures immediate consistency Assumes availability of all sites Asynchronous Independently with delays ranging from a few seconds to several days Immediate consistency is not achieved More flexible as at any one time not all sites need to be available

29 Types of data replicated Across heterogeneous data models Mapping required (hard) Object replication More varied than just base data Also auxiliary structures such as indexes Stored procedures and functions Scalability No volume restrictions

30 Replication administration Subscription mechanism Allows a permitted user to subscribe to replicated data/objects Initialisation mechanism Allows for the initialisation of a target replication

31 Ownership of Replicated Data 1 Master/Slave Master site Primary owner of replicated data Sole right to change data Publish and subscribe procedure Asynchronous replication as slave sites receive copies of the data Slave site Receive read-only data from master site Slaves can be used as mobile clients

32 Ownership of Replicated Data 2 Workflow Ownership Flexible master designation Dynamic ownership model Right to update data moves along the chain of command (replicating sites) For example, as order is processed the master right moves to each department in turn

33 Ownership of Replicated Data 3 Update-anywhere Peer-to-peer model Multiple sites can update data Conflict resolution required More complex implementation

34 Distribution and Replication in Oracle 9i Materialised views Formerly known as snapshots Views are updated by Refresh mechanism Variable frequency to suit application Fast – based on identified changes Complete – replaces existing data Force – tries Fast – if not possible – does Complete

35 Oracle 9i transparency Does not support Fragmentation transparency Supports Site (location) transparency

36 Summary of Distributed DBMS An area under keen development as improves Availability of data Overall reliability of system Performance (with good design) However, disadvantages remain: Implementation can be complex (expensive) Heterogeneity in models is poorly handled Use for replicating data is main application today