Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Distributed Systems CS
SE-292 High Performance Computing
Data Warehousing 1 Lecture-25 Need for Speed: Parallelism Methodologies Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Background Computer System Architectures Computer System Software.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
Lecture-33 DWH Implementation: Goal Driven Approach (1)
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
PMIT-6102 Advanced Database Systems
Computer System Architectures Computer System Software
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
CS Transaction Processing Lecture 18 Parallelism.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Ahsan Abdullah 1 Data Warehousing Lecture-7De-normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 18 Database System Architectures Debbie Hui CS 157B.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Ahsan Abdullah 1 Data Warehousing Lecture-9 Issues of De-normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Parallel Database Systems Instructor: Dr. Yingshu Li Student: Chunyu Ai.
N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.
Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Supporting Multi-Processors Bernard Wong February 17, 2003.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Outline Why this subject? What is High Performance Computing?
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Parallel IO for Cluster Computing Tran, Van Hoai.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Background Computer System Architectures Computer System Software.
Ahsan Abdullah 1 Data Warehousing Lecture-8 De-normalization Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
These slides are based on the book:
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
18-447: Computer Architecture Lecture 30B: Multiprocessors
CS5102 High Performance Computer Systems Thread-Level Parallelism
Chapter 20: Database System Architectures
Andy Wang COP 5611 Advanced Operating Systems
Lecture-32 DWH Lifecycle: Methodologies
Parallel Processing - introduction
Mapping the Data Warehouse to a Multiprocessor Architecture
April 30th – Scheduling / parallel
Chapter 17: Database System Architectures
CSE8380 Parallel and Distributed Processing Presentation
Introduction to Teradata
Distributed Systems CS
Distributed Systems CS
Lecture-35 DWH Implementation: Pitfalls, Mistakes, Keys
Lecture 23: Virtual Memory, Multiprocessors
Database System Architectures
Parallel DBMS DBMS Textbook Chapter 22
Presentation transcript:

Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research National University of Computers & Emerging Sciences, Islamabad

Data Warehousing 2 Background

3 When to parallelize? Useful for operations that access significant amounts of data. Useful for operations that can be implemented independent of each other “Divide-&-Conquer” Parallel execution improves processing for:  Large table scans and joins  Creation of large indexes  Partitioned index scans  Bulk inserts, updates, and deletes  Aggregations and copying SizeSizeD&CSizeD&C

Data Warehousing 4 Are you ready to parallelize? Parallelism can be exploited, if there is…  Symmetric multi-processors (SMP), clusters, or Massively Parallel (MPP) systems AND  Sufficient I/O bandwidth AND  Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%) AND  Sufficient memory to support additional memory-intensive processes such as sorts, hashing, and I/O buffers Word of caution Parallelism can reduce system performance on over-utilized systems or systems with small I/O bandwidth.

Data Warehousing 5 Scalability – Size is NOT everything Number of Concurrent Users Simple table retrieval Moderate complexity Join Propensity analysis Clustering Complexity of Technique Hash based B-Tree Multiple Bitmapped Index usage Amount of detailed data Complexity of Data Model       

Data Warehousing 6 Scalability- Speed-Up & Scale-Up Speed-Up More resources means proportionally less time for given amount of data. Scale-Up If resources increased in proportion to increase in data size, time is constant. Degree of Parallelism Transactions/Sec Degree of Parallelism Secs/Transaction Ideal Real

Data Warehousing 7 Quantifying Speed-up Sequential Execution Ideal Ideal Parallel Execution 18 time units 6 time units Task-1Task-2Task-3 Control work (“overhead”) Speedup = 18 = 300% 6 T s : Time on serial processor T m : Time on multiple processors Speedup = T s T m Task-1Task-2 Task-3

Data Warehousing 8 Speed-Up & Amdahl’s Law Reveals maximum expected speedup from parallel algorithms given the proportion of task that must be computed sequentially. It gives the speedup S as f f is the fraction of the problem that must be computed sequentially N N is the number of processors f SN As f approaches 0, S approaches N f NS Example-1: f = 5% and N = 100 then S = 16.8 f NS Example-2: f = 10% and N = 200 then S = 9.56 Not 1:1 Ratio Only formula and explanation will go to graphics

Data Warehousing 9 Amdahl’s Law: Limits of parallelization For less than 80% parallelism, the speedup drastically drops At 90% parallelism, 128 processors give performance of less than 10 processors.

Data Warehousing 10 Parallelization OLTP Vs. DSS There is a big difference. DSS Parallelization of a SINGLE query OLTP Parallelization of MULTIPLE queries Or Batch updates in parallel

Data Warehousing 11 Brief Intro to Parallel Processing  Parallel Hardware Architectures  Symmetric Multi Processing (SMP)  Distributed Memory or Massively Parallel Processing (MPP)  Non-uniform Memory Access (NUMA)  Parallel Software Architectures  Shared Memory  Shard Disk  Shared Nothing  Types of parallelism  Data Parallelism  Spatial Parallelism Shared Everything

Data Warehousing 12 Symmetrical Multi Processing (SMP)  A number of independent I/O and number of processors all sharing access to a single large memory space. Main Memory I/O P1P2P3P4  Typically each CPU executes its job independently.  Supports both Multi-Tasking and Parallel Processing.  Have to deal with issues such as Cache Coherence, Processor Affinity and Hot Spots. Yellow will not go to graphics

Data Warehousing 13  Composed of a number of self-contained, self-controlled nodes connected through a network interface.  Each node contains its own CPU, processor, memory and I/O.  Architecture better known as Massively Parallel Processing (MPP) or cluster computing.  Memory is distributed across all nodes. Distributed Memory Machines Bus, Switch or Network I/OP Memory I/OP Memory I/OP Memory  Network has the tendency to become the bottleneck.  Issues fundamentally different from those in SMP. Yellow will not go to graphics Node

Data Warehousing 14 A little bit of both worlds ! Distributed Shared Memory Machines Interconnection Network Main Memory I/O P1P2P3P4 Main Memory I/O P1P2P3P4 Main Memory I/O P1P2P3P4 Main Memory I/O P1P2P3P4

Data Warehousing 15 Shared disk RDBMS Architecture Clients/Users Shared Disk Interconnect Adv High level of fault tolerance Dis Adv Serialization due to locking Interconnect can become a bottleneck Yellow ill not go to graphics

Data Warehousing 16 Shared Nothing RDBMS Architecture Clients/Users Adv Data ownership changes infrequently There is no locking Dis Adv Data availability low on failure Very careful with data distribution Redistribution is expensive Yellow ill not go to graphics

Data Warehousing 17 Shared disk Vs. Shared Nothing RDBMS  Important note: Do not confuse RDBMS architecture with hardware architecture.  Shared nothing databases can run on shared everything (SMP or NUMA) hardware.  Shared disk databases can run on shared nothing (MPP) hardware. This slide will not go to graphics