Parallel Algorithms for Relational Operations Class ID: 21 Name: Shujia Zhang.

Slides:



Advertisements
Similar presentations
Distributed Systems CS
Advertisements

Two-Pass Algorithms Based on Sorting
CS 540 Database Management Systems
Parallel Databases These slides are a modified version of the slides of the book “Database System Concepts” (Chapter 18), 5th Ed., McGraw-Hill, by Silberschatz,
Parallel Databases By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Fall 2008Parallel Query Optimization1. Fall 2008Parallel Query Optimization2 Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety,
Peer-to-Peer Distributed Search. Peer-to-Peer Networks A pure peer-to-peer network is a collection of nodes or peers that: 1.Are autonomous: participants.
Parallel Database Systems
Dr. Kalpakis CMSC 661, Principles of Database Systems Query Execution [15]
Parallel Database Systems
Introduction to MIMD architectures
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
15.6 Index-Based Algorithms Sadiya Hameed ID: 206 CS257.
Parallel Algorithms for Relational Operations. Models of Parallelism There is a collection of processors. –Often the number of processors p is large,
Parallel Algorithms for Relational Operations. Many processors...and disks There is a collection of processors. –Often the number of processors p is large,
Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
Nested Loops Joins Book Section of chapter 15.3 Submitted to : Prof. Dr. T.Y. LIN Submitted by: Saurabh Vishal.
15.5 Two-Pass Algorithms Based on Hashing 115 ChenKuang Yang.
Query Execution :Nested-Loop Joins Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
1 Recap. 2 No. of Processors C.P.I Computational Power Improvement Multiprocessor Uniprocessor.
Distributed Databases and Query Processing. Distributed DB’s vs. Parallel DB’s Many autonomous processors that may participate in database operations.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
PMIT-6102 Advanced Database Systems
CSCE Database Systems Chapter 15: Query Execution 1.
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
CPS216: Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Classic Model of Parallel Processing
CS4432: Database Systems II Query Processing- Part 3 1.
Chapter 6 Query Execution. Query Query Compilation (Chapter 7 ) query plan Query execution metadata ( Chapter 6 ) data the major parts Of the query processor.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
CS4432: Database Systems II Query Processing- Part 2.
CSCE Database Systems Chapter 15: Query Execution 1.
Mapping the Data Warehouse to a Multiprocessor Architecture
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 6 th Edition Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism.
Lecture 14- Parallel Databases Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
©Silberschatz, Korth and Sudarshan20.1Database System Concepts 3 rd Edition Chapter 20: Parallel Databases Introduction I/O Parallelism Interquery Parallelism.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database.
CS 540 Database Management Systems
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
1 Lecture 23: Query Execution Monday, November 26, 2001.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Classification of parallel computers Limitations of parallel processing.
15.1 – Introduction to physical-Query-plan operators
CS 540 Database Management Systems
CS 440 Database Management Systems
Parallel Databases.
Introduction to Parallelism.
Chapter 15 QUERY EXECUTION.
Distributed System Structures 16: Distributed Structures
15.5 Two-Pass Algorithms Based on Hashing
Database Systems Ch Michael Symonds
Query Execution Two-pass Algorithms based on Hashing
(Two-Pass Algorithms)
One-Pass Algorithms for Database Operations (15.2)
CPSC-608 Database Systems
Presentation transcript:

Parallel Algorithms for Relational Operations Class ID: 21 Name: Shujia Zhang

Index Models of Parallelism Tuple-at-a-Time Operations in Parallel Parallel Algorithms for Full-Relation Operations Performance of Parallel Algorithms

Models of Parallelism Three most important classes of parallel machine: – Shared Memory – Shared Disk – Shared Nothing

Shared-memory

Shared-disk

Shared-nothing

Tuple-at-a-Time Operations in Parallel There are P processors, divide any relation R’s tuples evenly among the P processor’s disks. To compute σ c (R), use each processor to examine the tuples of R on it own disk. To avoid communication among processors, store the result on the processor’s own disk. Thus, the result will be divided among the P processors.

Problem: the selection is σ a=10 (R), suppose we divided R according to the value of attribute a. Then, we have all the tuples of R with a=10 are at one of processors, and the the entire relation σ a=10 (R) is at one processor. Solution: to use a hash function h that involves all the components of a tuple in such a way that changing one component of a tuple t can change h(t) to be any possible bucket number.

Parallel Algorithms for Full-Relation Operations The Full-Relation operation: δ(R) Use hash function to distribute the tuples of R, then we have all duplicate tuples of R at the same processor Suppose we want to take the union of the R and S – Use the same hash function – Not use the same hash function

Then, each processor will receives all the tuples of R and S that belongs in the same bucket and perform the union of R and S As a result, the relation of the union of R and S will be distributed over all the processor.

Performance of Parallel Algorithms The running time of a parallel algorithm on a p-processor machine compares with the time to execute an algorithm for the same operation on the same data, using a uniprocessor Result: the multiprocessors machine is faster than the uniprocessor machine. – the multiprocessors have more memory to give us additional efficiency and the extra memory allows us to use a more efficient algorithm – in proportion to the number of processors