Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Similar presentations


Presentation on theme: "Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics."— Presentation transcript:

1 Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan101@yahoo.com

2 Data Warehousing 2 Background

3 3 When to parallelize? Useful for operations that access significant amounts of data. Useful for operations that can be implemented independent of each other “Divide-&-Conquer” Parallel execution improves processing for:  Large table scans and joins  Creation of large indexes  Partitioned index scans  Bulk inserts, updates, and deletes  Aggregations and copying SizeSizeD&CSizeD&C

4 Data Warehousing 4 Are you ready to parallelize? Parallelism can be exploited, if there is…  Symmetric multi-processors (SMP), clusters, or Massively Parallel (MPP) systems AND  Sufficient I/O bandwidth AND  Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%) AND  Sufficient memory to support additional memory-intensive processes such as sorts, hashing, and I/O buffers Word of caution Parallelism can reduce system performance on over-utilized systems or systems with small I/O bandwidth.

5 Data Warehousing 5 Scalability – Size is NOT everything Number of Concurrent Users Simple table retrieval Moderate complexity Join Propensity analysis Clustering Complexity of Technique Hash based B-Tree Multiple Bitmapped Index usage Amount of detailed data Complexity of Data Model       

6 Data Warehousing 6 Scalability- Speed-Up & Scale-Up Speed-Up More resources means proportionally less time for given amount of data. Scale-Up If resources increased in proportion to increase in data size, time is constant. Degree of Parallelism Transactions/Sec Degree of Parallelism Secs/Transaction Ideal Real

7 Data Warehousing 7 Quantifying Speed-up Sequential Execution Ideal Ideal Parallel Execution 18 time units 6 time units Task-1Task-2Task-3 Control work (“overhead”) Speedup = 18 = 300% 6 T s : Time on serial processor T m : Time on multiple processors Speedup = T s T m Task-1Task-2 Task-3

8 Data Warehousing 8 Speed-Up & Amdahl’s Law Reveals maximum expected speedup from parallel algorithms given the proportion of task that must be computed sequentially. It gives the speedup S as f f is the fraction of the problem that must be computed sequentially N N is the number of processors f SN As f approaches 0, S approaches N f NS Example-1: f = 5% and N = 100 then S = 16.8 f NS Example-2: f = 10% and N = 200 then S = 9.56 Not 1:1 Ratio Only formula and explanation will go to graphics

9 Data Warehousing 9 Amdahl’s Law: Limits of parallelization For less than 80% parallelism, the speedup drastically drops. 128 10 At 90% parallelism, 128 processors give performance of less than 10 processors.

10 Data Warehousing 10 Parallelization OLTP Vs. DSS There is a big difference. DSS Parallelization of a SINGLE query OLTP Parallelization of MULTIPLE queries Or Batch updates in parallel

11 Data Warehousing 11 Brief Intro to Parallel Processing  Parallel Hardware Architectures  Symmetric Multi Processing (SMP)  Distributed Memory or Massively Parallel Processing (MPP)  Non-uniform Memory Access (NUMA)  Parallel Software Architectures  Shared Memory  Shard Disk  Shared Nothing  Types of parallelism  Data Parallelism  Spatial Parallelism Shared Everything

12 Data Warehousing 12 Symmetrical Multi Processing (SMP)  A number of independent I/O and number of processors all sharing access to a single large memory space. Main Memory I/O P1P2P3P4  Typically each CPU executes its job independently.  Supports both Multi-Tasking and Parallel Processing.  Have to deal with issues such as Cache Coherence, Processor Affinity and Hot Spots. Yellow will not go to graphics

13 Data Warehousing 13  Composed of a number of self-contained, self-controlled nodes connected through a network interface.  Each node contains its own CPU, processor, memory and I/O.  Architecture better known as Massively Parallel Processing (MPP) or cluster computing.  Memory is distributed across all nodes. Distributed Memory Machines Bus, Switch or Network I/OP Memory I/OP Memory I/OP Memory  Network has the tendency to become the bottleneck.  Issues fundamentally different from those in SMP. Yellow will not go to graphics Node

14 Data Warehousing 14 A little bit of both worlds ! Distributed Shared Memory Machines Interconnection Network Main Memory I/O P1P2P3P4 Main Memory I/O P1P2P3P4 Main Memory I/O P1P2P3P4 Main Memory I/O P1P2P3P4

15 Data Warehousing 15 Shared disk RDBMS Architecture Clients/Users Shared Disk Interconnect Adv High level of fault tolerance Dis Adv Serialization due to locking Interconnect can become a bottleneck Yellow ill not go to graphics

16 Data Warehousing 16 Shared Nothing RDBMS Architecture Clients/Users Adv Data ownership changes infrequently There is no locking Dis Adv Data availability low on failure Very careful with data distribution Redistribution is expensive Yellow ill not go to graphics

17 Data Warehousing 17 Shared disk Vs. Shared Nothing RDBMS  Important note: Do not confuse RDBMS architecture with hardware architecture.  Shared nothing databases can run on shared everything (SMP or NUMA) hardware.  Shared disk databases can run on shared nothing (MPP) hardware. This slide will not go to graphics


Download ppt "Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics."

Similar presentations


Ads by Google