Distributed Databases DBMS Textbook, Chapter 22, Part II.

Slides:



Advertisements
Similar presentations
Distributed databases 1. 2 Outline introduction principles / objectives problems.
Advertisements

1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Database Systems: Design, Implementation, and Management
V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS.
ISOM Distributed Databases Arijit Sengupta. ISOM Learning Objectives Understand the concept and necessity of distributed databases Understand the types.
Distributed Databases Chapter 22 By Allyson Moran.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
Chapter 13 (Web): Distributed Databases
Distributed Databases
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Chapter 25 Distributed Databases and Client-Server Architectures Copyright © 2004 Pearson Education, Inc.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
6/25/20151 PSU’s CS Parallel, Distributed Access  Parallel vs Distributed DBMSs  Parallel DBMSs  Measuring success: scalability  Hardware Architectures.
1 © Prentice Hall, 2002 Chapter 13: Distributed Databases Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
Manajemen Basis Data Pertemuan 11 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Chapter 12 Distributed Database Management Systems
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed databases
TDD: Topics in Distributed Databases
WPI, MOHAMED ELTABAKH PARALLEL & DISTRIBUTED DATABASES 1.
Distributed Databases
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Implementation of Database Systems, Jarek Gryz1 Distributed Databases Chapter 21, Part B.
DISTRIBUTED COMPUTING
DISTRIBUTED DATABASES 1. RECAP: PARALLEL DATABASES Three possible architectures Shared-memory Shared-disk Shared-nothing (the most common one) Parallel.
Instructor: Marina Gavrilova. Outline Introduction Types of distributed databases Distributed DBMS Architectures and Storage Replication Synchronous replication.
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
 Distributed file systems having transaction facility need to support distributed transaction service.  A distributed transaction service is an extension.
Distributed database system
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
 Parallel databases  Architecture  Query evaluation  Query optimization  Distributed databases  Architectures  Data storage  Catalog management.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
R*: An overview of the Architecture By R. Williams et al. Presented by D. Kontos Instructor : Dr. Megalooikonomou.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Lecture 11 Distributed Databases Modern Database Management 9 th Edition Jeffrey A. Hoffer,
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
DISTRIBUTED DATABASES AND DDBMS. Learning Objectives  Describe various DDBMS implementations  Explain how database design affects the DDBMS environment.
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
1 Advanced Database Systems: DBS CB, 2 nd Edition Parallel and Distributed Databases Ch. 20.
CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.
CENG 553 Database Management Systems1 Distributed Databases.
Distributed Databases “Fundamentals”
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Distributed Database Concepts
6/25/2018.
R*: An Overview of the Architecture
Distributed Databases
Introduction of Week 14 Return assignment 12-1
distributed databases
Distributed Databases
Presentation transcript:

Distributed Databases DBMS Textbook, Chapter 22, Part II

Introduction Data is stored at several sites, each managed by an independent DBMS. Distributed Data Independence: Users should not have to know where data is located (extends Physical and Logical Data Independence principles). Distributed Transaction Atomicity: Users should be able to write Xacts accessing multiple sites just like local Xacts.

Types of Distributed Databases Homogeneous: Every site runs same type of DBMS. Heterogeneous: Different sites run different DBMSs (different RDBMSs or even non-relational DBMSs). DBMS1DBMS2DBMS3 Gateway

Distributed DBMS Architectures Client-Server v Collaborating-Server CLIENT SERVER QUERY SERVER QUERY Client ships query to single site. All query processing at server. - Thin vs. fat clients. - Set-oriented communication, client side caching. Query can span multiple sites.

Storing Data Fragmentation  Horizontal: Usually disjoint.  Vertical: Lossless-join; tids. Replication  Gives increased availability.  Faster query evaluation.  Synchronous vs. Asynchronous. Vary in how current copies are. TID t1 t2 t3 t4 R1 R2 R3 SITE A SITE B

Distributed Catalog Management Must keep track of how data is distributed across sites. Must be able to name each replica of each fragment. To preserve local autonomy:  Site Catalog: Describes all objects (fragments, replicas) at a site + Keeps track of replicas of relations created at this site.  To find a relation, look up its birth-site catalog.  Birth-site never changes, even if relation is moved.

Distributed Queries Horizontally Fragmented: Tuples with rating = 5 at Tokyo.  Compute SUM(age), COUNT(age) at both sites.  If WHERE contained just S.rating>6, just one site. Vertically Fragmented: Sid and rating at Shanghai, sname and age at Tokyo, tid at both.  Must reconstruct relation by join on tid, then evaluate query. Replicated: Sailors copies at both sites.  Choice of site based on local costs, shipping costs. SELECT AVG(S.age) FROM Sailors S WHERE S.rating > 3 AND S.rating < 7

Distributed Joins Fetch as Needed, Page NL, Sailors as outer:  Cost: 500 D * 1000 (D+S)  D is cost to read/write page; S is cost to ship page.  If query was not submitted at London, must add cost of shipping result to query site.  Can also do INL at London, fetching matching Reserves tuples to London as needed. SailorsReserves LONDONPARIS 500 pages1000 pages

Distributed Joins Ship to One Site: Ship Reserves to London.  Cost: 1000 S D (SM Join; cost = 3*( ))  If result size is very large, may be better to ship both relations to result site and then join them! SailorsReserves LONDONPARIS 500 pages1000 pages

Semi-join Idea: Tradeoff cost of computing and shipping projection for cost of shipping full relation. Note: Especially useful if there is selection on full relation (that can be exploited via index); and answer desired back at initial site.

Semi-join At London, project Sailors onto join columns and ship this to Paris. At Paris, join Sailors projection with Reserves.  Result is called reduction of Reserves wrt Sailors. Ship reduction of Reserves to London. At London, join Sailors with reduction of Reserves. Idea: Useful if there is a selection on Sailors (reduce size), and answer desired at London. SailorsReserves LONDONPARIS 500 pages1000 pages

Bloom-join At London, compute bit-vector of some size k:  Hash join column values into range 0 to k-1.  If some tuple hashes to I, set bit I to 1 (I from 0 to k-1).  Ship bit-vector to Paris. At Paris, hash each tuple of Reserves similarly, and discard tuples that hash to 0 in Sailors bit-vector.  Result is called reduction of Reserves wrt Sailors. Ship bit-vector reduced Reserves to London. At London, join Sailors with reduced Reserves. Note : Bit-vector cheaper to ship, almost as effective.

Distributed Query Optimization Cost-based approach; consider all plans, pick cheapest; similar to centralized opt. Difference 1: Consider communication costs Difference 2: Respect local site autonomy Difference 3: New distributed join methods. Query site constructs global plan, with suggested local plans describing processing at each site.  If a site can improve suggested local plan, free to do so.

Issues of Updating Distributed Data, Replication, Locking, Recovery, and Distributed Transactions

Updating Distributed Data Synchronous Replication: All copies of modified relation (fragment) must be updated before modifying Xact commits.  Data distribution is made transparent to users. Asynchronous Replication: Copies of modified relation only periodically updated; different copies may get out of synch in meantime.  Users must be aware of data distribution. Current products tend to follow later approach.

Distributed Locking How manage locks across many sites?  Centralized: One site does all locking. Vulnerable to single site failure.  Primary Copy: All locking for object done at primary copy site for this object. Reading requires access to locking site as well as site where the object is stored.  Fully Distributed: Locking for a copy done at site where copy is stored. Locks at all sites while writing an object.

Distributed Deadlock Detection Each site maintains local waits-for graph. A global deadlock might exist even if local graphs contain no cycles: T1 T2 SITE ASITE BGLOBAL Three solutions: v Centralized (send all local graphs to one site); v Hierarchical (organize sites into a hierarchy and send local graphs to parent in the hierarchy); v Timeout (abort Xact if it waits too long).

Distributed Recovery Two new issues:  New kinds of failure : links and remote sites.  If “sub-transactions” of Xact executes at different sites, all or none must commit. Need a commit protocol to achieve this. A log is maintained at each site, as in a centralized DBMS, and commit protocol actions are additionally logged.

Two-Phase Commit (2PC) Two rounds of communication:  first, voting;  then, termination.  Both initiated by coordinator.

Summary Parallel DBMSs designed for scalable performance. Relational operators very well-suited for parallel execution.  Pipeline and partitioned parallelism. Distributed DBMSs offer site autonomy and distributed administration. Distributed DBMSs must revisit storage and catalog techniques, concurrency control, and recovery issues.