Basis for Distributed Database Technology

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Database Architectures and the Web
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Situation Aware Mobile Computing (SAMC) CPSC 608 Project Spring 2002 Project Members: Brent Dinkle Hemant Mahawar Marco Morales Sreekanth R. Sambavaram.
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
Reference Book Principles of Distributed Database System Chapters 4. Distributed DBMS Architecture 5. Distributed Database Design 7.5 Layers of Query Processing.
Chapter 12 Distributed Database Management Systems
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
The University of Akron Dept of Business Technology Computer Information Systems Database Management Approaches 2440: 180 Database Concepts Instructor:
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed Databases
Outline Introduction Background Distributed Database Design
Distributed databases
Alexandria Dodd Janelle Toungett
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Distributed Databases
DISTRIBUTED DBMS ARCHITECTURE
Distributed Databases and DBMSs: Concepts and Design
Client-Server Processing and Distributed Databases
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
12 1 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
Session-9 Data Management for Decision Support
Distributed DBMS Architecture
Session-8 Data Management for Decision Support
1 About the Instructor Name: Gong Zhiguo Office: N512 Phone: Remark: Some of the slides are tailored from.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Distributed Database Systems Overview
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition.
DDBMS Distributed Database Management Systems Fragmentation
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Distributed Databases
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Distributed database system
CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Chapter 12 Distributed Data Bases. Learning Objectives What a distributed database management system (DDBMS) is and what its components are How database.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
Chapter 1 Database Access from Client Applications.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Distributed DBMS Architecture Chapter 4 Principles Of Distributed Database Systems,2/e By Ozsu, Patrick Valduriez.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Distributed Database Concepts
Distributed Database Management Systems
Chapter 19: Distributed Databases
Distributed Databases and DBMSs: Concepts and Design
Database Architecture
Database System Architectures
Presentation transcript:

Basis for Distributed Database Technology Database System Technology (DST) controlled access to structured data aims towards centralized (single site) computing Computer Networking Technology (CNT) facilitates distributed computing goes against centralized computing Distributed Database Technology = DST + CNT aims to achieve integration without centralization brief eplanation about CPS, flexible manufacturing 3

What is distributed? Processing Logic Function Data Control All the above modes of distribution are necessary and important for distributed database technology brief eplanation about CPS, flexible manufacturing 3

Distributed database system A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (DDBMS) is a software system that permits the management of the distributed databases and makes the distribution transparent to the users. differentiate collaboration and coordination, trade-offs. 4

What is not a DDBMS? A DDBMS is not a “collection of files” that can be stored at each node of a computer network. A multiprocessor system based DBMS (parallel database system) is not a DDBMS. A DDBMS is not a system wherein data resides only at one node. bring on slides of Mohan into the other transparency 5

Aims of Distributed DBMS - Transparent Management of Distributed & Replicated Data Transparency refers to separation of the higher-level semantics of a system from lower-level implementation details. From data independence in centralized DBMS to fragmentation transparency in DDBMS. Who should provide transparency? - DDBMS! motivation and major issues

Aims of Distributed DBMS - Reliability through Distributed Transactions Distributed DBMS can use replicated components to eliminate single point failure. The users can still access part of the distributed database with “proper care” even though some of the data is unreachable. Distributed transactions facilitate maintenance of consistent database state even when failures occur. motivation and major issues

Aims of Distributed DBMS - Improved Performance Since each site handles only a portion of a database, the contention for CPU and I/O resources is not that severe. Data localization reduces communication overheads. Inherent parallelism of distributed systems may be exploited for inter-query and intra-query parallelism. Performance models are not sufficiently developed. motivation and major issues

Aims of Distributed DBMS - Easier System Expansion Ability to add new sites, data, and users over time without major restructuring. Huge centralized database systems (mainframes) are history (almost!). PC revolution (Compaq buying Digital, 1998) will make natural distributed processing environments. New applications (such as, supply chain) are naturally distributed - centralized systems will just not work. motivation and major issues

Complicating Factors Data may be replicated in a distributed environment. Therefore, DDBMS is responsible for (i) choosing one of the stored copies of the requested data, and (ii) making sure that the effect of an update is reflected on each and every copy of that data item. Maintaining consistency of distributed/replicated data. Since each site cannot have instantaneous information on the actions currently carried out in other sites, the synchronization of transactions at multiple sites is harder than centralized system. and Complexity, Cost, Distribution of control, Security,... motivation and major issues

Problem Areas Distributed Database Design Distributed Query Processing Distributed Directory Management Distributed Concurrency Control Distributed Deadlock Management Reliability of Distributed Databases Operating Systems Support Heterogeneous Databases motivation and major issues

Relationship among Problems Directory Management Query Processing Distributed DB Design Reliability Concurrency Control Deadlock Management motivation and major issues

Transparency and Architecture issues in DDBMSs Basic introduction, how many years project, how many people working on it 1

Top-Down DDBMS Architecture - Classical Global Schema Fragmentation Schema Site Independent Schemas Allocation Schema Local Mapping Schema I Local Mapping Schema I Other sites DBMS I DBMS I differentiate collaboration and coordination, trade-off. Local Database I Local Database 2 Site 1 Site 2 4

Top-Down DDBMS Architecture - Classical Global Schema: a set of global relations as if database were not distributed at all Fragmentation Schema: global relation is split into “non-overlapping” (logical) fragments. 1:n mapping from relation R to fragments Ri. Allocation Schema: 1:1 or 1:n (redundant) mapping from fragments to sites. All fragments corresponding to the same relation R at a site j constitute the physical image Rj. A copy of a fragment is denoted by Rji. Local Mapping Schema: a mapping from physical images to physical objects, which are manipulated by local DBMSs.

Global Relations, Fragments and Physical Images (Site2) (Site 1) (Site3) Physical Images Fragments Separating concepts of fragmentation and allocation Explicit control of redundancy Independence from local databases Allows for: Fragmentation Transparency Location Transparency Local Mapping Transparency

Rules for Data Fragmentation Completeness: All the data of the global relation must be mapped into fragments. Reconstruction: It must always be possible to reconstruct each global relation from its fragments. Disjointedness: It is convenient if the fragments are disjoint so that the replication of data can be controlled explicitly.

Types of Data Fragmentation Vertical Fragmentation Projection on relation (subset of attributes) Reconstruction by join Updates require no tuple migration Horizontal Fragmentation Selection on relation (subset of tuples) Reconstruction by union Updates may requires tuple migration Mixed Fragmentation A fragment is a Select-Project query on relation. Vertical Fragmentation Horizontal Fragmentation

Levels of Distribution Transparency Fragmentation Transparency: Just like using global relations. Location Transparency: Need to know fragmentation schema; but need not know where fragments are located. Applications access fragments (no need to specify sites where fragments are located). Local Mapping Transparency: Need to know both fragmentation and allocation schema; no need to know what the underlying local DBMSs are. Applications access fragments explicitly specifying where the fragments are located. No Transparency: Need to know local DBMS query languages, and write applications using functionality provided by the Local DBMS

Why is support for transparency difficult? There are tough problems in query optimization and transaction management that need to be tackled (in terms of system support and implementation) before fragmentation transparency can be supported. Less distribution transparency the more the end-application developer needs to know about fragmentation and allocation schemes, and how to maintain database consistency. Higher levels of distribution transparency require appropriate DDBMS support, but makes end-application developers work easy.

Some Aspects of top-down architecture Distributed database technology is an “add-on” technology, most users already have populated centralized DBMSs. Whereas top down design assumes implementation of new DDBMS from scratch. In case of OODBMs, top-down architecture makes sense because most OODBMs are going to be built from scratch. In many application environments, such as semi-structured databases, continuous multimedia data, the notion of fragment is difficult to define. Current relational DBMS products provide for some form of location transparency (such as, by using nicknames).

Bottom up Architecture - Present & Future Possible ways in which multiple databases may be put together for sharing by multiple DBMSs. The DBMSs are characterized according to Autonomy - degree to which individual DBMSs can operate independently. Tightly coupled - integrated (A0), Semiautonomous -federated (A1), Total Isolation - multidatabase systems(A2) Distribution - no distribution - single site (D0), client-server - distribution of DBMS functionality (D1), full distribution - peer to peer distributed architecture(D2) Heterogeneity - homogeneous (H0) or heterogeneous (H1)

Distributed DBMS Implementation Alternatives Distribution (A0,D2,H0) (A2,D2,H1) Autonomy Heterogeneity

Architectural Alternatives (A0,D0,H0): multiple DBMSs that are logically integrated at single site - composite systems. (A0,D0,H1): multiple database managers that are heterogeneous but provide integrated view to the user. (A0,D1,H0): client-server based DBMS. (A0,D2,H0): Classical distributed database system architecture. (A1,D0,H0): Single site, homogeneous, federated database systems - not realistic. (A1,D0,H1): heterogeneous federated DBMS, having common interface over disparate cooperating specialized database systems.

Architectural Alternatives (A1,D1,H1): heterogeneous federated database systems with components of the systems placed at different sites. (A2,D0,H0): homogeneous multidatabase systems at a single site. (A2,D0,H1): heterogeneous multidatabase systems at a single site. (A2,D1,H1) & (A2,D2,H1): distributed heterogeneous multidatabase systems. In case of client-server environments it creates a three layer architecture. Interoperability is the major issue. Autonomy, distribution, heterogeneity are orthogonal issues.

Client/Server Database Systems Distinguish and divide the functionality to be provided into two classes: server functions and client functions. That is, two level architecture. Made popular by relational DBMS implementations. DBMS client: user interface, application, consistency checking of queries, and caching and managing locks on cached data. DBMS Server: handles query optimization, data access and transaction management. Typical scenarios: multiple clients/single server; multiple client/multiple servers (dedicated home-server or any server)

Client/Server Reference Architecture User Interface Application Program Operating System Client DBMS Communication software SQL Queries Result Relation Communication software Semantic Data Controller Query Optimizer Operating Transaction Manager Recovery Manager Runtime Support Processor System Database

Distributed Database Reference Architecture ES1 ES2 ESn GCS LCS1 LCS2 LCSn LIS1 LIS2 LISn

Components of Distributed DBMS User System Responses User Requests External Schema User Interface Handler User Processor Global Conceptual Schema Semantic Data Controller GD/D Global Query Optimizer Global Execution Monitor Local Conceptual Schema Local Query Processor Data Processor Local Recovery Manager System Log Local Internal Schema Runtime Support Processor Database

MDBS Architecture With Global Schema GES1 GES2 GES3 LES11 LES12 LES13 GCS LESn1 LESn2 LESn3 LCS1 LCSn LIS1 LISn

MDBS Architecture without Global Schema ES1 ES2 ESn Multidatabase Layer Local Database System Layer LCS1 LCS2 LCSn LIS1 LIS2 LISn

Components of MDBS User System Responses User Requests Multi-DBMS Layer Query Processor Query Processor Transaction Manager Transaction Manager Scheduler Scheduler Recovery Manager Recovery Manager Runtime Support Processor Runtime Support Processor Database Database

Global Directory Issues Directory is itself a database that contains meat-data about the actual data stored in the database. It includes the support for fragmentation transparency for the classical DDBMS architecture. Directory can be local or distributed. Directory can be replicated and/or partitioned. Directory issues are very important for large multi-database applications, such as digital libraries.

Impact of new technologies Internet and WWW Semi-structured data, multimedia data Keyword based search - browsing versus querying What does integration mean? Applied technologies Workflow systems Data warehousing & Data mining What is the role of distributed database technology?

Research Issues - DDBMS Technology Evaluation of state of the art data replication strategies. On-line distributed relational database redesign. Distributed object-oriented database systems - design (fragmentation, allocation), query processing (methods execution, transformation), transaction processing WWW and Internet - transparency issues, implementation strategies (architecture, scalability), On-line transaction processing, On-line analytical processing (data warehousing , data mining), query processing (STRUDEL, WebSQL), commit protocols

Research Issues - Applications Workflow systems - High throughput (supply chain, Amazon,..) short, sweet, and robust versus ad-hoc (office automation) problem solving. Electronic commerce - reliable high throughput, distributed transactions. Distributed multimedia - QoS, real-time delivery, design and data allocation, MPEG-4 aspects.