Distributed Database Systems

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Query Optimization Goal: Declarative SQL query
1 Relational Query Optimization Module 5, Lecture 2.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Session – 10 QUERY OPTIMIZATION Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
CS 104 Introduction to Computer Science and Graphics Problems
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Access Path Selection in a Relation Database Management System (summarized in section 2)
Query Processing Presented by Aung S. Win.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
1 Implementation of Relational Operations: Joins.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
KORHAN KOCABIYIK1 R* Optimizer Validation and Performance Evaluation for Distributed Queries Lothar F. Mackert, Guy M. Lohman IBM Almaden Research Center.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Databases Illuminated
 Distributed file systems having transaction facility need to support distributed transaction service.  A distributed transaction service is an extension.
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
© IBM Corporation 2005 Informix User Forum 2005 John F. Miller III Explaining SQLEXPLAIN ®
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
CS4432: Database Systems II Query Processing- Part 2.
R*: An overview of the Architecture By R. Williams et al. Presented by D. Kontos Instructor : Dr. Megalooikonomou.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Practical Database Design and Tuning
Module 11: File Structure
CS 440 Database Management Systems
Database Management System
CSC 4250 Computer Architectures
Database Performance Tuning and Query Optimization
Introduction to Query Optimization
Evaluation of Relational Operations
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
R*: An Overview of the Architecture
CMSC 611: Advanced Computer Architecture
Lecture 12 Lecture 12: Indexing.
Practical Database Design and Tuning
Main Memory Background Swapping Contiguous Allocation Paging
Outline Introduction Background Distributed DBMS Architecture
Lecture 2- Query Processing (continued)
Implementation of Relational Operations
Chapter 11 Database Performance Tuning and Query Optimization
Evaluation of Relational Operations: Other Techniques
Distributed Databases
Distributed Deadlock Detection
Database System Architectures
Distributed Databases
Presentation transcript:

Distributed Database Systems R* Optimizer Validation and Performance Evaluation for Local Queries Lothar F. Mackert, Guy M. Lohman R* Optimizer Validation and Performance Evaluation for Distributed Queries R*: An Overview of the Architecture R. Williams, D. Daniels, L. Haas, G. Lapis, B. Lindsay, P. Ng, R. Obermarck, P. Selinger, A. Walker, P. Wilms, and R. Yost

What is System R? R*? System R is a database system built as a research project at IBM San Jose Research (now IBM Almaden Research Center) in the 1970's. System R introduced the SQL language and also demonstrated that a relational system could provide good transaction processing performance.

R* basic facts Each site is autonomous as possible. No central scheduler, no central deadlock detection, no central catalog, etc. R* uses snapshot data – a “copy of a relation(s) in which the data is consistent but not necessarily up to date.” used to provide a static copy of the database

Object naming in R* For autonomy, no global naming system required. To keep object names unique, site name incorporated into names, called System Wide Names – SWN EX. USER@USER_SITE.OBJECT_NAME @BIRTH_SITE

Global Catalogs in R* All Global Table names stored at all sites Creation of a global table involves broadcasting global relation name to all sites in the network. Catalogs at each site keep and maintain info about objects in the dbase, including replicas or fragments, stored at the site.

Transaction Numbering A transaction is given a number that is composed of the unique site name and a unique sequence number from that site that incorporates time of day at that site so no synchronization between sites is needed. The transaction number is both unique and ordered in the R* framework

Transaction Numbering (cont) Numbers used in deadlock detection. Uniqueness is used for identification purposes to determine which transactions control which locks to avoid case where a transaction is waiting for itself. In case of a deadlock, R* aborts the youngest, largest numbered transaction.

Transaction commit Termination must be uniform – all sites commit or all sites abort Two phase commit protocol. One site acts as coordinator – makes commit or abort decision after all sites involved in the transaction are known to be recoverably prepared to commit or abort and all sites are waiting coordinators decision.

Transaction commit (cont) While non-coordinator sites await coordinator decision, all locks held – transaction resources are sequestered. Before entering the prepare state, any site can abort the transaction – other sites will abort after a transaction timeout. After entering the prepare state, a site may not abandon the transaction. 3(N-1) messages needed to successfully commit, 4(N-1) messages if a transaction must abort.

Authorization All sites cooperate in R* voluntarily and no site wishes to trust others with authorization. Each individual site must check remote access requests and all controls regarding accessing data are stored at that same site.

Compilation, Plan Generation R* compiles rather than interprets the query language. Recompilation may need to be done if objects change in the database during compilation – ie, table deleted. Recompilation is done at a local level with a commit process similar to a transaction.

Binding in compilation When/where should binding occur? All binding for every request can be done at a chosen site? – no, creates a bottleneck site. All binding can be done at the site where request began? – no, compiling site should not need to remember physical details about access paths at remote sites. All binding can be done in a distributed way? Yes, requesting site can decide high level details, leave minor/ low level details to other sites.

Deadlock detection No centralized deadlock detection. Each site does periodic deadlock detection using transaction wait-for info gathered locally or received from others. Wait-for strings are sent from one site to the next. If a site finds a cycle, youngest transaction is aborted.

Deadlock Example

Changes Made to R. Explain – writes out optimizer details such as estimated cost to temp tables. Collect Counters – dumps internal system variables to temporary tables. Force Optimizer – order optimizer to choose a particular (perhaps suboptimal) plan.

New SQL instructions EXPLAIN PLAN FOR <any valid Delete, Insert, Select, Select Into, Update, Values, or Values Into SQL Statement>

R* Cost Structure Cost = Wcpu(#_instructions) + Wi/o(#ios) + Wmsg(#_MSGs) + Wbyt(#_byt) Wmsg for some constant penalty for sending a communication over the network Wbyt penalty for message length.

Is CPU cost significant? Both local and distributed queries found cpu cost to be important. CPU costs high in sorts 1.allocating temp disk space for partially sorted strings 2. Encoding and decoding sorted columns 3. Quicksorting individual pages in mem.

CPU costs continued CPU costs are also high in scans Although “CPU cost is significant . . . [it] is not enough to affect the choice of the optimizer”

CPU Cost Equation CPUsort = ACQ_TEMP + #SORT * CODINGINST + #PAGES * QUICKSORTINST + #PASS * (ACQ_TEMP + #PAGES * IO_INST + #SORT * NWAY * MERGINST)

Improve Local Joins Communicate type of Join Are there likely to be runs of pages to prefetch? Can choice of LRU, MRU, DBMin improve performance if Join type known.

Optimizer performance (Local) Optimizer has trouble modeling unclustered indexes on smaller tables. In such cases, Optimizer actually picks worst plan, thinking it is the best and thinks the best is actually the worst. Why? Tuple order unimportant to nested loop join and index on outer table may clutter buffer. A highly selective predicate may eliminate the need for a scan in which case the index is important.

Optimizer (Local) cont. Adding an index can increase the cost - The Optimizer models each table independently, ignoring competition for buffer space amongst two tables being joined. Nested loop join estimates often artificially inflated – optimizer pessimistically models worst case buffer behavior by saying each outer tuple starts new inner scan when may times inner pages are in the buffer.

Distributed Joins Simplest kind – single table access at romote site A process at remote site accesses the table and ships back the result. When doing joins, can try to ship smaller of two tables. Or can try to ship the outer to take advantage of indexes on inner.

Tuple Blocking Can get faster response time with Tuple Blocking. Tuples “stream in” from a query. Pay more message overhead, one message per tuple instead of one message for entire result.

Transfer Trade-offs Option W – transfer the whole inner table. Negatives No indexes can be shipped with it. May ship inner tuples that have no matching outer tuples Positives Predicates applied before shipping may reduce size Only one transfer is needed for the join which may result in lower overall network costs

Transfer Trade-offs Cont. Option F – Fetch matching tuples only Idea – Send outer tuple join column values, match with inner tuples, then send these inner tuples over network Negatives Multiple rounds of sending congest network May have to send whole table anyway – W better Positives Tables may have few actual join values in common, can eliminate need to send many inner tuples.

Use W or F Option? In W – Network costs only 2.9% of total Strategy F handy when: Cardinality of outer table <0.5 the # of messages required to ship the inner table as a whole. Idea behind rule – beat plan W in theory by sending few join values from outer table that will weed out most inner tuples The join cardinality < inner cardinality Idea behind 2nd rule – since most inner tuples will not be needed, we can beat plan W by sending only outer join values and eliminating most inner tuples.

What might be better Ship outer relation to the inner relation and return results to outer relation Allows use of indexes on inner relation in nested loops join If outer is small, this works well

Outer relation shipping Cont. Shipping outer “enjoys more simultaneity” – ie, Nested loop join For all outer tuples do For all inner tuples do If outer == inner add result to answer. Can start outer loop and iterate inner loop with only fraction of outer tuples arrived. For shipping inner relation, must wait for whole thing to do loop iterations.

Distributed vs. Local Joins Total resources consumed in Distributed joins higher than in local joins Response time in Distributed joins less than in Local Joins What does this mean? We have more machines doing work in a distributed join so they can do work in parallel- more work is done but since more machines are doing work, the result takes less time.

Distrib vs. Local Joins Cont. Response time improvement for distributed queries has 2 reasons 1. Parallelism 2. Reduced Contention – accessing tables using unclustered indexes benefit greatly from larger buffers – n machines = n buffer size. Negatives of Distributed – Slow network speeds make reverse true, then local joins are faster.

Alternative Join methods Dynamically Create Temporary Index on Inner Table Since we cannot send an index, we can try to make one Cost structure may be high Scan entire table and send to site 1 Store table and create a temporary index on it at site 1 Execute best local join plan

Semijoin Sort S and T on the join column. Produce S’, T’ Send S’ ‘s join column values to site T, match against T’ and send these actual tuples to site S. Merge-join T’ ‘s tuples and S’ ‘s tuples to get answer.

Bloom join Use Bloom filter – bit string sort of like hash table where each bit represents a bucket like in a hash table. All bits start off 0. If a value hashes to bit x, turn x to 1. Generate a Bloom filter for table S and send to T. Hash T using the same hash function and ship any tuples that hash to a 1 in S’s Bloom filter At site S, join T’s tuples with table S.

Comparing join methods Bloom joins generally outperform other methods Semijoins advantageous when both data and index (unclustered) pages of inner table fit into the buffer so that efficient use of these tables keep semijoins procesing costs low. If not, constant paging of unclustered index results in poor performance.

Why are Bloom Joins better? Message costs of Semi and Bloom comparable Semijoin incurs higher local processing costs to perform a “second join”, ie once send S’ ‘s join column to T’, join, then send this result to S’ and join these T’ values with S’.

Commercial Products