6/25/20151 PSU’s CS 587 22. Parallel, Distributed Access  Parallel vs Distributed DBMSs  Parallel DBMSs  Measuring success: scalability  Hardware Architectures.

Slides:



Advertisements
Similar presentations
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Advertisements

Database Architectures and the Web
ISOM Distributed Databases Arijit Sengupta. ISOM Learning Objectives Understand the concept and necessity of distributed databases Understand the types.
Parallel Databases By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Distributed Databases Chapter 22 By Allyson Moran.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Distributed Databases
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Parallel DBMS Slides adapted from textbook; from Joe Hellerstein; and from Jim Gray, Microsoft Research. DBMS Textbook Chapter 22.
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
Parallel DBMS Chapter 22, Part A
1 © Prentice Hall, 2002 Chapter 13: Distributed Databases Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 Parallel DBMS Slides by Joe Hellerstein, UCB, with some material from Jim Gray, Microsoft Research. See also:
Chapter 12 Distributed Database Management Systems
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Distributed databases
WPI, MOHAMED ELTABAKH PARALLEL & DISTRIBUTED DATABASES 1.
Distributed Databases
1 The Google File System Reporter: You-Wei Zhang.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Parallel DBMS Instructor : Marina Gavrilova
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Implementation of Database Systems, Jarek Gryz1 Distributed Databases Chapter 21, Part B.
Distributed Database Systems Overview
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
DISTRIBUTED COMPUTING
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Instructor: Marina Gavrilova. Outline Introduction Types of distributed databases Distributed DBMS Architectures and Storage Replication Synchronous replication.
Distributed Databases
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
 Parallel databases  Architecture  Query evaluation  Query optimization  Distributed databases  Architectures  Data storage  Catalog management.
MBA 664 Database Management Systems Dave Salisbury ( )
R*: An overview of the Architecture By R. Williams et al. Presented by D. Kontos Instructor : Dr. Megalooikonomou.
Lecture 14- Parallel Databases Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed DBMS, Query Processing and Optimization
Chapter 1 Database Access from Client Applications.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
Implementation of Database Systems, Jarek Gryz 1 Parallel DBMS Chapter 22, Part A.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.
CS 440 Database Management Systems
Parallel DBMS Chapter 22, Part A
Parallel DBMS Chapter 22, Sections 22.1–22.6
Distributed Databases
Database System Architectures
distributed databases
Parallel DBMS DBMS Textbook Chapter 22
Distributed Databases
Presentation transcript:

6/25/20151 PSU’s CS Parallel, Distributed Access  Parallel vs Distributed DBMSs  Parallel DBMSs  Measuring success: scalability  Hardware Architectures  Types of parallelism, partitioning  Parallelizing operators: scan, select, sort, aggregate, join  Distributed DBMSs  Classifying distributed DBMSs Hetero- & homo-geneous, client- & collaborating server, horizontal & vertical fragmentation  Distributed catalog management  Distributed query processing General queries, joins, optimization  Replication Synchronous: voting, read-any write-all Asynchronous: Peer to Peer, Primary Site (Capture and Apply)  Distributed Locking, Deadlock Detection, Transactions

6/25/20152 PSU’s CS 587 Parallel vs. Distributed DBMSs  DBMS may be spread among the nodes of a network  Parallel DBMS  Nodes are under a central control  Typically nodes are in one location  Motivated by performance  Distributed DBMS  Nodes are autonomous, may run different DBMSs  Nodes are physically distributed  Motivated by need for remote access to data in spite of network failures

6/25/20153 PSU’s CS 587 Motivation for Parallel DBMSs: What if one CPU is not enough?  For a large web application, one CPU is never enough  A large web application may need to scan a 1TB file or service 100s of customers at one time.  At 50MB/second, scanning a 1TB file takes 5 hours  Using 1000 CPUs scanning 1000 disks, the scan will take 20 seconds  Similarly, 100s of customers accessing a database can result in response times in the minutes  Solution: Parallel DBMSs

6/25/20154 PSU’s CS 587 DBMS: The || Success Story  DBMSs, and Information Retrieval, are the most (only?) successful application of parallelism.  Every major DBMS vendor has some || server  Large DBMS backends, for web applications, are all parallelized.  Large IR applications are parallelized  Reasons for DBMS success:  Relational Algebra has few operators Just parallelize each one and compose them  Bulk-processing (= partition ||-ism).  Natural pipelining in operator trees.

6/25/20155 PSU’s CS 587 How do we measure the success of parallel algorithms and architectures?  Scalability  There are two kinds of scalability  Speedup : When the number of nodes grows by a factor n, then so does the efficiency  Scaleup : When the number of nodes and the size of the database both grow by n, then the efficiency does not change  Efficiency can be measured in terms of  Elapsed time, or average time per transaction  Number of transactions per second ( throughput )

6/25/20156 PSU’s CS 587 Architecture Issue: Shared What? Easy to program. Expensive to build. Difficult to scale Beyond ~30 procs Hard to program. Cheap to build. Easy to scale To 1000s of procs SUN, many others Clusters NCR Teradata, IBM SP

6/25/20157 PSU’s CS 587 How would you parallelize this plan?  Assume Sailors, Reservations are already sorted on sid SailorsReservations  rank<5 ⋈ sid On-the- fly Sort-merge join

6/25/20158 PSU’s CS 587 Different Types of DBMS ||-ism  Pipeline parallelism: Inter-operator  each operator may run concurrently on a different node (may exploit pipelining or bushy plans)  Partition parallelism: Intra-operator  multiple nodes work to compute a given operation (scan, sort, join)  We’ll focus on intra-operator ||-ism  Challenge: How to partition the data!

6/25/20159 PSU’s CS 587 Three Types of Data Partitioning RangeHashRound Robin Shared disk and memory less sensitive to partitioning, Shared nothing benefits more from "good" partitioning A...E F...J K...NO...ST...Z A...E F...JK...NO...S T...Z A...EF...J K...N O...S T...Z Good for equijoins, range queries group-by Good for equijoins Group-by Good to spread load

6/25/ PSU’s CS 587 How to Parallelize the Scan Operator  Given a partition, what is the parallel version of the scan operator?  Scan each partition, then merge  What kind of partition will make the scan scalable?  Equal size

6/25/ PSU’s CS 587 Parallelizing Selection  What is the parallel version of the selection operator?  Range partition, range selection  Hash partition, range selection  Hash partition, equality selection  Round Robin partition  Which of the above is scalable for speedup?  Range partition, range selection  Hash partition, range selection  Hash partition, equality selection  Round Robin partition

6/25/ PSU’s CS 587 Parallel Sorting: Two Algorithms  Distributive sort : Range partition the data, then sort each range  Is it scalable? Only if partitions are equal sized  How to choose partitioning vector?  Parallel External Sort-Merge : Sort the data at each node, then merge  Problem: how to merge in parallel  One solution: range partition the data  Is it scalable?

6/25/ PSU’s CS 587 Parallel Sorting, ctd.  How to choose partitioning vector? Sampling  What if sampling is not possible?  For example, if input is another process, such as another operator in a query plan  In this case can use external sort-merge and set up a tree of merging steps  See US Patent #5,852,826

6/25/ PSU’s CS 587 Parallel Aggregates  Problem: Compute an aggregate function in parallel  For each aggregate function, need a decomposition:  sum (S) = sum( sum (s(i)) )  count (S) = sum  count (s(i)) )  avg (S) = (  sum (s(i))) /  count (s(i))  Is it scalable?  Sometimes it’s not so simple: median ( )  Similar for groups

6/25/ PSU’s CS 587 Parallel Hash Join  Recall Phase 1 of Hash Join  Do Phase 1 in parallel. For R, then S:  Perform partitioning at each node  Send hash bucket i’s output to node i, j to j, etc.  Phase 2 is simpler  Perform join of Ri and Si at each node i Original Relations (R then S) OUTPUT 2 B main memory buffers Disk INPUT 1 hash function h B-1 Partitions 1 2 B-1... Phase 1

6/25/ PSU’s CS 587 Dataflow Network for || Join: A ⋈ B  Good use of split/merge makes it easier to build parallel versions of sequential join code. Node I Node J

6/25/ PSU’s CS 587 Complex Parallel Query Plans  Complex Queries: Inter-Operator parallelism  Pipelining between operators: note that sort and phase 1 of hash-join block the pipeline!!  Bushy Trees A BRS Sites 1-4Sites 5-8 Sites 1-8

6/25/ PSU’s CS 587 Distributed Databases  Data is stored at several sites, each managed by a DBMS that can run independently.  Distributed Data Independence: Users should not have to know where data is located (extends Physical and Logical Data Independence principles).  Distributed Transaction ACIDITY: Users should be able to write atomic and durable Xacts accessing multiple sites

6/25/ PSU’s CS 587 Types of Distributed Databases  Homogeneous: Every site runs same type of DBMS.  Heterogeneous: Different sites run different DBMSs (different RDBMSs or even non- relational DBMSs). DBMS1DBMS2DBMS3 Gateway

6/25/ PSU’s CS 587 Network Architectures  Definitions  Client: Requests a service  Server: provides a service  Client-server architecture: one of each  Fat client: Applications run from the client  Thin client: Applications run from the server  Business Applications use Thin Client Fat Clients are too hard to mange  Personal Applications use Fat Client Does anyone use Google Docs?  3-tier architecture. Middle tier runs middleware.  Middlware includes business logic, DBMS coordination  DBMS back end may involve multiple collaborating servers

6/25/ PSU’s CS 587 Availability  Definition: % of time can answer queries  Sometimes only local queries can be answered  What detracts from availability?  Node crashes  Network crashes  During a network (perhaps partial) crash, the DBMS at each node should continue to operate.

6/25/ PSU’s CS 587 Storing Data –Fragmentation  Horizontal Fragmentation  Usually disjoint  Example  Motivation: More efficient to place data where queries are located, if possible  Vertical Fragmentation  Remember Normalization?  Not as common in DDBMSs as horizontal fragmentation

6/25/ PSU’s CS 587 Storing Data - Replication  Disadvantages (redundancy!)  Wasted space  Possibly inconsistent data values  Advantages  Increased availability  Faster query evaluation  How do we keep track of all replicas of data? SITE A R1 R2 R3 SITE B

6/25/ PSU’s CS 587 Distributed Catalog Management  Catalog contains schema, authorization, statistics, and location of each relation/replica/fragment  Problem: where to store the catalog?  Example: I want to query sales. Where do I find it? Location info is in the catalog, but where is the catalog?  Solutions  Store the entire catalog at every site Catalog updates are too expensive  Store the entire catalog at a single master site Single point of failure, performance bottleneck  Store catalog info for all replicas and fragments of a relation at the birthsite of each relation How do I find the birthsite of a relation?

6/25/ PSU’s CS 587 Distributed Catalog Management (ctd.)  Intergalactic standard solution:  Name each relation with E.g.,  Keep catalog of a relation at its birthplace  How many network I/Os are required to update a catalog?  If a data item’s location is changed, does the birth-site change? How many network I/Os are required?

6/25/ PSU’s CS 587 Distributed Queries  Horizontally Fragmented: Tuples with rating = 5 at Tokyo.  Must compute SUM (age), COUNT (age) at both sites.  If WHERE contained just S.rating>6, just one site.  Vertically Fragmented: sid and rating at Shanghai, sname and age at Tokyo, tid at both.  How to evaluate the query?  Replicated: Sailors copies at both sites.  Choice of site based on local costs, network costs. SELECT AVG(S.age) FROM Sailors S WHERE S.rating > 3 AND S.rating < 7

6/25/ PSU’s CS 587 Distributed Joins  Fetch as Needed, Page NL, Sailors as outer:  Cost: 500 D * 1000 (D+S)/10 /10 because of rank=5 condition  D is cost to read/write page; S is cost to ship page.  S is very large, so cost is essentially 50,000S  Ship to One Site: Ship Reserves to London.  Cost: 1000 S D (SM Join; cost = 3*( ))  Essentially 1000S SailorsReserves LONDONPARIS 500 pages1000 pages SELECT * FROM Sailors S, Reserves R WHERE S.sid = Reserves.sid and rank=5

6/25/ PSU’s CS 587 Semijoin Technique  At London, project  rank=5 Sailors onto join column sid and ship this to Paris. Cost: 5S  10% for rank=5, 10% for project onto join column  At Paris, join Sailors projection with Reserves.  Result is called reduction of Reserves wrt Sailors.  Ship reduction of Reserves to London. Cost: 100S  At London, join Sailors with reduction of Reserves.  Total cost: 105S  Idea: Tradeoff the cost of computing and shipping projection for cost of shipping full Reserves relation.

6/25/ PSU’s CS 587 Bloomjoin  At London  Create a 64K bit vector V (one 8K page)  Select h a hash function from sids to [0,64K-1]  If there’s a sailor with rating 5 set V(h(her sid))=true  Ship V to Paris. Cost: 1S  At Paris  If a reservation has V(h(its sid))=true, ship it to London  Cost: 100S Perhaps a bit more if hash function is not effective, but 64K is a pretty big bit vector  At London  Join Sailors with reduced Reservations.  Total cost: 101S.

6/25/ PSU’s CS 587 Distributed Query Optimization  Cost-based approach; consider plans, pick cheapest; similar to centralized optimization.  Difference 1: Communication costs must be considered.  Difference 2: Local site autonomy must be respected.  Difference 3: New distributed join methods.  Query site constructs global plan, with suggested local plans describing processing at each site.  If a site can improve suggested local plan, free to do so.

6/25/ PSU’s CS 587 Updating Distributed Data  Synchronous Replication: All copies of a modified relation (fragment) must be updated before the modifying Xact commits.  Data distribution is made transparent to users.  Asynchronous Replication: Copies of a modified relation are only periodically updated; different copies may get out of synch in the meantime.  Users must be aware of data distribution.

6/25/ PSU’s CS 587 Synchronous Replication  Read-any Write-all: Writes are slower and reads are faster, relative to Voting.  Most common approach to synchronous replication.  Voting: Xact must write a majority of copies to modify an object; must read enough copies to be sure of seeing at least one most recent copy.  E.g., 10 copies; 7 written for update; 4 copies read.  Each copy has version number.  Not attractive usually because reads are common.  Choice of technique determines which locks to set.

6/25/ PSU’s CS 587 Cost of Synchronous Replication  Before an update Xact can commit, it must obtain locks on all modified copies.  Sends lock requests to remote sites, and while waiting for the response, holds on to other locks!  If sites or links fail, Xact cannot commit until they are back up.  Even if there is no failure, committing must follow an expensive commit protocol with many msgs.  So the alternative of asynchronous replication is becoming widely used.

6/25/ PSU’s CS 587 Asynchronous Replication  Allows modifying Xact to commit before all copies have been changed (and readers nonetheless look at just one copy).  Users must be aware of which copy they are reading, and that copies may be out-of-sync for short periods of time.  Two approaches: Primary Site and Peer-to- Peer replication.  Difference lies in how many copies are ``updatable’’ or ``master copies’’.

6/25/ PSU’s CS 587 Peer-to-Peer Replication  More than one of the copies of an object can be a master in this approach.  Changes to a master copy must be propagated to other copies somehow.  If two master copies are changed in a conflicting manner, this must be resolved. (e.g., Site 1: Joe’s age changed to 35; Site 2: to 36)  Best used when conflicts do not arise:  E.g., Each master site owns a disjoint fragment.

6/25/ PSU’s CS 587 Primary Site Replication  Exactly one copy of a relation is designated the primary or master copy. Replicas at other sites cannot be directly updated.  The primary copy is published.  Other sites subscribe to (fragments of) this relation; these are secondary copies.  Main issue: How are changes to the primary copy propagated to the secondary copies?  Done in two steps. First, capture changes made by committed Xacts; then apply these changes.

6/25/ PSU’s CS 587 Implementing the Capture Step  Log-Based Capture: The log (kept for recovery) is used to generate a Change Data Table (CDT).  If this is done when the log tail is written to disk, must somehow remove changes due to subsequently aborted Xacts.  Procedural Capture: A procedure that is automatically invoked does the capture; typically, just takes a snapshot.  Log-Based Capture is better (cheaper, faster) but relies on proprietary log details.

6/25/ PSU’s CS 587 Implementing the Apply Step  The Apply process at the secondary site periodically obtains (a snapshot or) changes to the CDT table from the primary site, and updates the copy.  Period can be timer-based or user/application defined.  Replica can be a view over the modified relation!  If so, the replication consists of incrementally updating the materialized view as the relation changes.  Log-Based Capture plus continuous Apply minimizes delay in propagating changes.  Procedural Capture plus application-driven Apply is the most flexible way to process changes.

6/25/ PSU’s CS 587 Distributed Locking  How do we manage locks for objects across many sites?  Centralized: One site does all locking. Vulnerable to single site failure.  Primary Copy: All locking for an object done at the primary copy site for this object. Reading requires access to locking site as well as site where the object (copy/fragment) is stored.  Fully Distributed: Locking for a copy is done at the site where the copy/fragment is stored. Locks at all or many sites while writing an object.

6/25/ PSU’s CS 587 Distributed Deadlock Detection  Each site maintains a local waits-for graph.  A global deadlock might exist even if the local graphs contain no cycles: T1 T2 SITE ASITE BGLOBAL v Three solutions: Centralized (send all local graphs to one site); Hierarchical (organize sites into a hierarchy and send local graphs to parent in the hierarchy); Timeout (abort Xact if it waits too long).

6/25/ PSU’s CS 587 Distributed Transactions  Problem: Atomicity and Durability in DDBMSs  Example:  A+=100 at Paris, B-=100 at London  Each is called a subtransaction.  The query originates at a third site called the coordinator.  Who keeps the log? Each site.  How does each subtransaction know when to commit?

6/25/ PSU’s CS Phase Commit (2PC) Protocol  Coordinator sends prepare message to each subtransaction  Each subtransaction responds with yes or no  Yes means it has done everything but write the end record  Coordinator sends commit or abort to all subtransactions  Subtransactions send ack messages to coordinator

6/25/ PSU’s CS 587 2PC Issues  What if  only one subtransaction says no?  All subtransactions say yes, then one cannot commit?  All subtransactions say yes, then one crashes?

6/25/ PSU’s CS 587 Learning Objectives  Definitions: parallel and distributed DBMSs, scalability, 3 parallel architectures, pipeline and partition parallelism, 3 types of data partitioning and their uses  How to parallelize operators and determine their scalability: scan, select, sort, aggregate, hash join  Definition of distributed data independence and acidity, network architectures, availability, fragmentation, replication  How is replication managed?  How are distributed queries processed?  How is distributed data updated?  How are distributed locks and deadlocks managed?  How are distributed transactions managed?