Distribution of Data and Processes The cost of acquiring a distributed system is now often within the budget of an individual department or plant. No need.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

Chapter 10: Designing Databases
Distributed Processing, Client/Server and Clusters
C6 Databases.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
CS 582 / CMPE 481 Distributed Systems
Database Management: Getting Data Together Chapter 14.
Overview Distributed vs. decentralized Why distributed databases
Organizing Data & Information
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Chapter 14 The Second Component: The Database.
Database Features Lecture 2. Desirable features in an information system Integrity Referential integrity Data independence Controlled redundancy Security.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Databases and Database Management Systems
Chapter 1 Introduction to Databases
Distributed Databases
Page 1 ISMT E-120 Introduction to Microsoft Access & Relational Databases The Influence of Software and Hardware Technologies on Business Productivity.
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
Distributed Deadlocks and Transaction Recovery.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Database Design – Lecture 16
© 2007 by Prentice Hall 1 Introduction to databases.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
1 Unit 1 Information for management. 2 Introduction Decision-making is the primary role of the management function. The manager’s decision will depend.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
Distributed Transactions Chapter 13
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
XA Transactions.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Session 1 Module 1: Introduction to Data Integrity
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
SQL Basics Review Reviewing what we’ve learned so far…….
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Database System Implementation CSE 507
Chapter 19: Distributed Databases
Chapter 9 Designing Databases
Basic Concepts in Data Management
G063 - Distributed Databases
Outline Announcements Fault Tolerance.
Chapter 17 Designing Databases
Presentation transcript:

Distribution of Data and Processes The cost of acquiring a distributed system is now often within the budget of an individual department or plant. No need for massive corporate mainframe budgets. However the need for close cooperation with other departments taking distributed options is probably more necessary as departments increase autonomy. There is some consistency and uniformity across processes and departments that is desirable. The cost and scale of distributed systems make it easy to ignore the role as a member of a community.

This lack of integration can negate any benefits of moving to a distributed option. 'Corporate' distributed applications are claimed to achieve the largest 'leverage' and payback. The major issues in distributed database systems do not centre around physical configuration of nodes and communications links but around processing within the architecture and data storage and information flow.

The major organisational decisions are 1.Organisation of corporate and private data 2.Separating operational and decision support functions 3.Organisation of the data warehouse

Operational and Decision Support Processing Operational Processing operates on current value data. Requirement driven- operates to a pre-defined requirement. Manipulates data on a detailed record-by-record basis. Has access to archival data that has a high probability of access and typically low volume. Ownership of data is very important. Mainly for the benefit of the community making up-to-the- second decisions

Decision Support Systems Processing Supports managerial roles. Operates on archival data not subject to updates (snapshots). Operates on integrated data (from a variety of operational systems). Processing essentially data driven i.e. queries can be ad-hoc and driven by results of previous queries. No specific requirement statement. Operates on data in a collective way e.g. summaries and statistics. Used in making medium to long term decisions.

Private and Corporate Data There is a need for uniformity and consistency of data within a distributed environment. Within the network there is data of interest to individuals and other data of interest to the corporate whole. e.g. One node may be for vehicle insurance actuary. Noone is concerned what the actuary does in terms of analysing vehicle data. But when actuary modifies insurance rates, coverages, lengths of policies, renewal terms etc...the changes have to made in a 'corporate' fashion so that everyone is aware of the changes.

At each node there is likely to be corporate data and private data. Freely mixing corporate and private data in an unplanned manner will cause problems. Also mixing operational data and decision support data will lead to problems Mixing all together in unplanned manner is a recipe for disaster.

Node Residency The bulk of the data that a node needs to process should be held locally. This aids the definition of the system of record. It also reduces the complexity of the client-server network. Performance is improved.

Process Based (Functional) Each node has data pertinent to the processing to be carried out e.g. the accounting node, the marketing node, the actuarys node.... Will lead to much redundant data between nodes. Unique Processes, Redundant Data. This will lead to a great deal of integrity checking between nodes. However there is a strong desire to align node residency with organisational arrangements.

Data Based Each nodes holds a type of data e.g. customer information, accounts information, premium information... Minimised data redundancy but much overlap of processing (i.e. between nodes) as different organisational functions want data integrated from the separate nodes. Unique Data, Redundant Processes. It is easier to produce the same software that will execute on many nodes rather than software to run on one node and 'shuffle' transactions across the network to maintain data integrity (see process-based). So data based node residency leads to simplicity in terms of processing. Unfortunately it does not correspond to the organisational structures of enterprises. So there is some loss of local autonomy and control- a driver for client-server architectures.

Geography Based e.g. one node dealing with business in the South-west, one for North-east..... Although data and processing redundant traffic between nodes minimal. Data structures and programs replicated across each node.

‘Pure client/server’ Data not distributed. All data (and software possibly) held on a server node. Clients read data when required. System of record needs to be defined if clients allowed to update data.

System of Record Notion that for all data that can be updated at any moment in time there is one and only one node responsible for update. If, at any instant in time, 2 or more parties have 'control' of some data at the same time then there is a flaw in the system. Many different ways that the SOR can be implemented- hence the general term. The system of record needs to be clearly and accurately defined.

e.g. Insurance policy updates Single node could be responsible for updating all policies One node updates A-M, another N-Z. One node updates all policies except those formally passed to a different node for a specific time and then returned.etc... In all cases there is one 'owner' of the data at any moment in time.

The most 'natural' form of node residency is process- based, for reasons of control and node autonomy mainly. Hence much redundant data. Hence high risk of data inconsistency and loss of database integrity. A change to a particular set of data must be replicated at all nodes holding the same data.

How is the system of record to be established? Which node will trigger the replication? A node can be given 'ownership' of data and the copy at that node is always regarded as correct. Owner node triggers changes on nodes holding copies. No other node allowed to update that data and local changes by non-owner nodes ignored if carried out. Non-owner nodes must request data from owner node if unsure of correctness of local data or to be sure of using correct data.

Ensuring the integrity of transactions in a distributed system can be very complex. This is normally dealt with using the Two Phase Commit

Two Phase Commit Required when a transaction changes data in more than one database server. Simple commit will not work because some servers may succeed and others fail breaching principle of atomicity (all or nothing) First phase All locations involved are sent a request asking them to prepare to commit. Second phase Can only start when all locations have responded positively to first phase. Request sent to all locations to commit. Commit fails if any location responds that it cannot commit or fails first phase.

One location takes control of whole transaction to provide coordination. Bear in mind that a transaction may involve mixed operations e.g. an insert on one system,, a delete on another, and update on another If the coordinator process receives a fail message from any location then it must tell other locations to rollback. Or write log information so that failed nodes can bring themselves up-to-date eventually So, the actual protocol for a two phase commit can be complex.

3 Phase Commit 3 Phase Commit adds a ‘Pre-Commit’ phase. First phase-Vote All locations involved are sent a request asking them to prepare to commit. Second-Pre Commit Phase All locations are told that nodes are going to commit or otherwise Third phase-Global Decision to Commit Can only start when all locations have responded positively to first phase.

DSS System of Record DSS processing based on archival data Important that there is consistency of recording archival data. Otherwise the criteria to select data for archiving likely to be different at different nodes. Archive data is a 'snapshot' at a particular time so all archive data time-stamped. So data at all nodes must be archived using the same selection criteria and in the same time-slot

for instance- crime records.... every crime, between mon-fri, night-time, day-time, major/minor- or what? The system of record in this context is the node responsible for archiving- the Data Warehouse. Warehouse may be separate node from clients doing operational or DSS processing. Typically high performance, massive storage, dedicated machine.

Fragmentation/ Partitioning Horizontal ]Distribution of different rows in the same table to be different sites. Suitable when different sites are carrying out the same functions on data. Vertical distribution of data based on columns. This may involve redundantly including the primary key at different sites to ensure uniqueness. Suitable when different sites are carrying out different functions.

Integrity Checking Amount of checking required and complexity depends on the degree and type of fragmentation, replication and distribution policy. Replication transactions must also be replicated successfully. Horizontal- Constraints more complex e.g. key constraints must be checked on all inserts at each fragment (inc. referential integrity). Unless distribution based on partition of key values. Vertical- fragments must contain primary key attribute(s).