Distributed Database Systems Overview

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Manajemen Basis Data Pertemuan 9 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
Distributed Database Management Systems
Chapter 9 : Distributed Database.
Overview Distributed vs. decentralized Why distributed databases
Chapter 12 Distributed Database Management Systems
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed Databases
DISTRIBUTED DATABASE MANAGEMENT SYSTEM CHAPTER 07.
Outline Introduction Background Distributed Database Design
Distributed databases
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Distributed Databases and DBMSs: Concepts and Design
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Database Architectures and the Web Session 5
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
1 12. Course Summary Course Summary Distributed Database Systems.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
1 About the Instructor Name: Gong Zhiguo Office: N512 Phone: Remark: Some of the slides are tailored from.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Distributed Databases Midterm review. Lectures covered Everything until (including) March 2 nd Everything until (including) March 2 nd Focus on distributed.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition.
GSLIS - The University of Texas at Austin LIS 384K.11, Database-Management Principles and Applications LIS 384K.11 Database-Management Principles and Applications.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Distributed Databases
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Distributed database system
CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
Chapter 12 Distributed Data Bases. Learning Objectives What a distributed database management system (DDBMS) is and what its components are How database.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
Chapter 1 Database Access from Client Applications.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Distributed DBMS Architecture Chapter 4 Principles Of Distributed Database Systems,2/e By Ozsu, Patrick Valduriez.
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Database Concepts
Distributed Database Management Systems
Chapter 19: Distributed Databases
Distributed Databases
Database Architecture
Distributed Database Management Systems
Introduction of Week 14 Return assignment 12-1
Presentation transcript:

Distributed Database Systems Overview Presented By Satrio Agung Wicaksono

Distributed Database Systems Made up of database system and computer network technologies 1. Terdiri dari sistem database dan teknologi jaringan komputer 2. Database mengelola kontrol data yang berpindah dari aplikasi ke pusat dan mengontrol akses system (DBMSs) Databases management of data moves control of data from applications to centralized and controlled access systems (DBMSs)

Distributed Database Systems, Cont’d… Computer network technology emphasizes distributed (non-central) control Database systems seem to emphasize centralization Network systems seem to emphasize distribution Databases, however, are not really about centralizing the management of data Database management systems really integrate data and supply a common access methodology to data 1. Teknologi jaringan komputer menekankan terdistribusi (non-tengah) kontrol 2. Sistem database tampaknya menekankan sentralisasi 3. Sistem jaringan tampaknya menekankan distribusi 4. Database, bagaimanapun, tidak benar-benar tentang sentralisasi pengelolaan data 5. Sistem manajemen database benar-benar mengintegrasikan data dan menyediakan akses metodologi umum untuk data yang

What is Distributed Processing? It means many things to many people Distributed function (single program distributed on multiple processors) Distributed computing (autonomous functions distributed on a network) Networks (independent of function) Multiprocessors (multiple CPUs in the same computer) etc. In any computer, there is always some aspect of distributed processing (e.g., CPU and I/O functions) We need a better definition of distributed computing to better understand distributed database computing 1. Ini berarti banyak hal untuk banyak orang 1.1 Distributed fungsi (program tunggal didistribusikan pada beberapa prosesor) 1.2 Komputasi terdistribusi (fungsi otonom didistribusikan pada jaringan 1.3 Jaringan (independen fungsi) 1.4 Multiprocessors (beberapa CPU dalam komputer yang sama) 1.5 dan lain-lain 2. Dalam komputer manapun, selalu ada beberapa aspek pengolahan terdistribusi (misalnya, CPU dan I / O fungsi) 3. Kita perlu memahami distributed computing untuk lebih memahami distributed database computing

What is a Distributed Database System? A collection of multiple, logically interrelated databases distributed over a computer network A Distributed Database Management System (DDBMS) is the software systems that manages distributed databases and makes the distribution transparent to the user 1. Sebuah koleksi dari beberapa, database secara logis saling didistribusikan melalui jaringan komputer 2. Database Distributed Management System (DDBMS) adalah sistem perangkat lunak yang mengelola database terdistribusi dan membuat distribusi transparan kepada pengguna

Central Database on a Network Sebuah DDBS bukan sebuah sistem di mana, meskipun ada jaringan, namun database berada hanya pada satu simpul jaringan. Dalam hal ini,masalah manajemen database tidak berbeda dengan masalah yang dihadapi dalam lingkungan database terpusat. Database dikelola secara terpusat oleh satu sistem komputer (situs 2) dan semua permintaan yang diarahkan ke situs tersebut. Satu-satunya pertimbangan tambahan harus dilakukan dengan penundaan transmisi. Hal ini jelas bahwa keberadaan jaringan komputer atau koleksi "file" tidak cukup untuk membentuk suatu sistem database terdistribusi A DDBS is also not a system where, despite the existence of a network, the database resides at only one node of the network. In this case, the problems of database management are no different than the problems encountered in a centralized database environment (shortly, we will discuss client/server DBMSs which relax this requirement to a certain extent). The database is centrally managed by one computer system (site 2) and all the requests are routed to that site. The only additional consideration has to do with transmission delays. It is obvious that the existence of a computer network or a collection of “files” is not sufficient to form a distributed database system

DDBS Environment

What a Distributed Database Architecture IS! Distributed database system consists of autonomous databases at distributed nodes

Advantages of Distributed Database Systems Transparent Management of Distributed and Replicated Data Transparency hides many of the lower-level implementation issues Users and applications do not have to understand and manage the distribution The DDBMS appears as a single DBMS A single query to the DDBMS database is translated to potentially many queries on multiple DBMSs correctly The effects of a single query on multiple databases are managed consistently and automatically The queries are semantically correct The queries are executed in the right order Data replication is handled properly and automatically -Transparansi menyembunyikan banyak tingkat rendah isu-isu implementasi -Pengguna dan aplikasi tidak harus memahami dan mengelola distribusi -DDBMS muncul sebagai DBMS tunggal -Sebuah query tunggal ke database DDBMS diterjemahkan ke pertanyaan berpotensi banyak pada beberapa DBMSs dengan benar -Efek dari permintaan tunggal pada beberapa database dikelola secara konsisten dan secara otomatis - Para query semantik benar - Query dieksekusi dalam urutan yang benar -Data replikasi ditangani dengan benar dan secara otomatis

Advantages of Distributed Database Systems (cont’d...) Reliability Through Distributed Transactions Maintains database consistency across multiple transactions Multiple applications and users may execute sets of all-or-nothing queries Applications and users do not “step”on each other Each applications appears to be have the “complete attention”of the database Effects of other user transactions are not noticed This would be near impossible without a DDBMS -Menjaga konsistensi database di beberapa transaksi -Beberapa aplikasi dan pengguna dapat mengeksekusi set semua-atau-tidak ada pertanyaan -Aplikasi dan pengguna tidak "melangkah" satu sama lain -Setiap aplikasi tampaknya memiliki "perhatian penuh" dari database -Pengaruh dari transaksi pengguna lain tidak terlihat -Ini akan menjadi hampir mustahil tanpa DDBMS

Advantages of Distributed Database Systems (cont’d...) Improved Performance Algorithms are tuned for distribution Database design tuned for distribution and usage patterns Based on internal DDBMS statistics, efficient query plans are calculated Efficient query algorithms and optimal database design are beyond the capability of most users Even if the users have these capabilities, they do not have access to internal DDBMS statistics to make effective design and query choices -Algoritma disetel untuk distribusi -Desain database disetel untuk pola distribusi dan penggunaan -Berdasarkan statistik DDBMS internal, rencana query yang efisien dihitung -Algoritme query yang efisien dan desain database yang optimal berada di luar kemampuan sebagian besar pengguna -Bahkan jika pengguna memiliki kemampuan ini, mereka tidak memiliki akses ke statistik DDBMS internal untuk membuat desain yang efektif dan pilihan permintaan

Advantages of Distributed Database Systems (cont’d...) Easier System Expansion The next database node fits into a pre-existing architecture Much of the database integration software is already in place The new database node is managed consistently within the context of the other database nodes Without a DDBMS, all applications needing data at the new node would need to be modified and tuned to the specifics of that database Access patterns Query semantics Query integrity Transaction management -database node berikutnya cocok dengan arsitektur yang sudah ada sebelumnya -Sebagian besar perangkat lunak integrasi database sudah di tempat -database node baru dikelola secara konsisten dalam konteks node database lain -Tanpa DDBMS, semua aplikasi membutuhkan data pada node baru akan perlu dimodifikasi dan disetel dengan spesifikasi database yang akses pola -Query semantik -Pertanyaan integritas -transaksi manajemen

Complications Introduced by DDBMS Essentially, this is what the course is about It is assumed that we know how various features are implemented in a single DMBS We need more tools to handle the distribution Data replication For reliability and efficiency Choose the site to retrieve from Update modifies all copies Failed sites need to be updated when they come on board Synchronization of values at distributed sites Software is inherently more complex Distribution of control Security -Pada dasarnya, ini adalah belajar tentang : -Hal ini diasumsikan bahwa kita tahu bagaimana berbagai fitur diimplementasikan dalam DMBS tunggal -Kita perlu alat yang lebih untuk menangani distribusi -replikasi data -Untuk kehandalan dan efisiensi -Pilih situs yang akan diambil dari -Pembaruan memodifikasi semua salinan -Situs Gagal perlu diperbarui ketika mereka datang di papan -Sinkronisasi nilai di lokasi terdistribusi -Software secara inheren lebih kompleks -Distribusi kontrol -keamanan

Design Issues Distributed Database Design Distributed Directory Management Distributed Query Processing Distributed Concurrency Control Distributed Deadlock Management Reliability of Distributed DBMS Replication Heterogeneous databases

Distributed Database Design The question that is being addressed is how the database and the applications that run against it should be placed across the sites. There are two basic alternatives to placing data: partitioned (or non-replicated) Replicated The two fundamental design issues : fragmentation, distribution, -Pertanyaan yang sedang dibahas adalah bagaimana database dan aplikasi yang berjalan menentangnya harus ditempatkan di seluruh situs. -Ada dua alternatif dasar untuk menempatkan Data: -dipartisi (atau non-direplikasi) dan -direplikasi. -Dalam skema dipartisi database dibagi menjadi beberapa partisi menguraikan masing-masing yang ditempatkan di situs yang berbeda. Desain dapat direplikasi baik sepenuhnya direplikasi (juga disebut penuh digandakan) di mana seluruh database disimpan di setiap situs, atau sebagian direplikasi (atau sebagian digandakan) di mana setiap partisi dari database disimpan pada lebih dari satu situs, tetapi tidak pada semua situs. -Dua isu desain dasar adalah : -fragmentasi, pemisahan database ke partisi yang disebut fragmen, -distribusi, optimum distribusi fragmen. -Penelitian di daerah ini sebagian besar melibatkan pemrograman matematika dalam rangka untuk meminimalkan biaya gabungan menyimpan database, pengolahan transaksi menentangnya, dan pesan komunikasi antar situs. Masalah umum adalah NP-keras. -Oleh karena itu, solusi yang diusulkan didasarkan pada heuristik

Distributed Directory Management There are three levels of directories Conceptual Logical Physical Directories are consulted for most database operations There are many issues that concern whether to distribute or centralize the directories -Sebuah direktori berisi informasi (seperti deskripsi dan lokasi) tentang data item dalam database. Masalah yang terkait dengan manajemen direktori secara alami menyerupai masalah penempatan database yang dibahas dalam bagian sebelumnya. Sebuah direktori mungkin global terhadap DDBS seluruh atau lokal untuk setiap situs, bisa terpusat pada satu situs atau didistribusikan ke beberapa situs, ada dapat salinan tunggal atau beberapa salinan

Distributed Query Processing The problem is how to decide on a strategy for executing each query over the network in the most cost-effective way, however cost is defined The factors to be considered : the distribution of data communication costs lack of sufficient locally-available information pengolahan Query berhubungan dengan desain algoritma yang mengalisa query dan mengkonversinya ke dalam serangkaian operasi manipulasi data. Masalahnya adalah bagaimana untuk memutuskan strategi pelaksanakan setiap query melalui jaringan dalam biaya yang paling efektif, namun biaya didefinisikan. Faktor-faktor yang harus dipertimbangkan adalah : -distribusi data, -biaya komunikasi, dan -kurangnya informasi lokal yang tersedia cukup. Tujuannya adalah untuk mengoptimalkan mana paralelisme yang melekat digunakan untuk meningkatkan kinerja melaksanakan transaksi, tunduk pada kendala yang disebutkan di atas. Itu masalah adalah NP-keras di alam, dan pendekatan biasanya heuristik

Distributed Concurrency Control Synchronization of access to distributed databases Maintain integrity of the system Single distributed database Multiple copies of the database Approaches Pessimistic Concurrency Control Optimistic Concurrency Control Example approaches Locking Timestamps Concurrency control melibatkan sinkronisasi akses ke database terdistribusi, sehingga integritas database dipertahankan. Hal ini, tanpa diragukan lagi, salah satu masalah yang paling ekstensif dipelajari di bidang DDBS. Permasalahan pengendalian Concurrency control dalam konteks terdistribusi agak berbeda daripada terpusat. Satu tidak hanya harus khawatir tentang integritas database tunggal, tetapi juga tentang konsistensi beberapa salinan database. Kondisi yang mengharuskan semua nilai-nilai beberapa salinan dari setiap item data untuk berkumpul dengan sama Nilai ini disebut konsistensi bersama. Concurrency control involves the synchronization of accesses to the distributed database, such that the integrity of the database is maintained. It is, without any doubt, one of the most extensively studied problems in the DDBS field. The concurrency control problem in a distributed context is somewhat different than in a centralized framework. One not only has to worry about the integrity of a single database, but also about the consistency of multiple copies of the database. The condition that requires all the values of multiple copies of every data item to converge to the same value is called mutual consistency. The alternative solutions are too numerous to discuss here, so we examine them in detail in Chapter 11. Let us only mention that the two general classes are pessimistic , synchronizing the execution of user requests before the execution starts, and optimistic, executing the requests and then checking if the execution has compromised the consistency of the database. Two fundamental primitives that can be used with both approaches are locking, which is based on the mutual exclusion of accesses to data items, and timestamping, where the transaction executions are ordered based on timestamps. There are variations of these schemes as well as hybrid algorithms that attempt to combine the two basic mechanisms.

Distributed Deadlock Management Similar to Operating Systems deadlock management Well known solutions Prevention Avoidance Detection/recovery The deadlock problem in DDBSs is similar in nature to that encountered in operating systems. The competition among users for access to a set of resources (data, in this case) can result in a deadlock if the synchronization mechanism is based on locking. The well-known alternatives of prevention, avoidance, and detection/recovery also apply to DDBSs

Reliability of Distributed DBMS Failure recovery among multiple sites Make sure other systems are reliable and consistent We will explore the ARIES algorithm We will explore the Two Phase Commit (2PC) algorithm Kami disebutkan sebelumnya bahwa salah satu keuntungan potensial dari sistem terdistribusi adalah kehandalan meningkat dan ketersediaan. Hal ini, bagaimanapun, adalah bukan merupakan fitur yang datang otomatis. Adalah penting bahwa mekanisme harus disediakan untuk memastikan konsistensi dari database serta untuk mendeteksi kegagalan dan memulihkan dari mereka. Implikasinya untuk DDBSs adalah bahwa ketika kegagalan terjadi dan berbagai situs menjadi baik dioperasi atau tidak dapat diakses, database di situs operasional tetap konsisten dan up to date. Selanjutnya, ketika sistem komputer atau jaringan pulih dari kegagalan tersebut, DDBSs harus dapat memulihkan dan membawa database di situs gagal up-to-date. Ini mungkin sangat sulit dalam kasus partisi jaringan, di mana situs dibagi menjadi dua kelompok atau lebih dengan tidak ada komunikasi di antara mereka We mentioned earlier that one of the potential advantages of distributed systems is improved reliability and availability. This, however, is not a feature that comes automatically. It is important that mechanisms be provided to ensure the consistency of the database as well as to detect failures and recover from them. The implication for DDBSs is that when a failure occurs and various sites become either inoperable or inaccessible, the databases at the operational sites remain consistent and up to date. Furthermore, when the computer system or network recovers from the failure, the DDBSs should be able to recover and bring the databases at the failed sites up-to-date. This may be especially difficult in the case of network partitioning, where the sites are divided into two or more groups with no communication among them

Replication If the distributed database is (partially or fully) replicated, it is necessary to implement protocols that ensure the consistency of the replicas,i.e., copies of the same data item have the same value Jika database terdistribusi (sebagian atau seluruhnya) direplikasi, maka perlu untuk mengimplementasikan protokol yang menjamin konsistensi replika, yaitu, salinan dari item data yang sama memiliki nilai yang sama. Protokol ini bisa bersemangat dalam bahwa mereka memaksa update untuk diterapkan pada semua replika sebelum transaksi selesai, atau mereka mungkin malas sehingga transaksi update satu salinan (disebut master) dari mana update yang disebarkan kepada yang lain setelah transaksi selesai If the distributed database is (partially or fully) replicated, it is necessary to implement protocols that ensure the consistency of the replicas,i.e., copies of the same data item have the same value. These protocols can be eager in that they force the updates to be applied to all the replicas before the transaction completes, or they may be lazy so that the transaction updates one copy (called the master) from which updates are propagated to the others after the transaction completes

Heterogeneous databases Sometimes called multi-databases Distributed databases are fully autonomous Usually databases already exist and the distributed database system integrates them Requires translations among database systems with canonical description of overall environment Complimentary to distributed database systems -Kadang-kadang disebut multi-database -Database terdistribusi sepenuhnya otonom -Biasanya database sudah ada dan sistem database terdistribusi mengintegrasikan mereka -Membutuhkan terjemahan antara sistem database dengan deskripsi kanonik lingkungan secara keseluruhan -Gratis ke sistem database terdistribusi