Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.

Slides:



Advertisements
Similar presentations
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Advertisements

CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
1 Meshes of Trees (MoT) and Applications in Integer Arithmetic Panagiotis Voulgaris Petros Mol Course: Parallel Algorithms.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Parallel Architectures: Topologies Heiko Schröder, 2003.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Parallel Architectures: Topologies Heiko Schröder, 2003.
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
1 Interconnection Networks Direct Indirect Shared Memory Distributed Memory (Message passing)
Interconnection Networks 1 Interconnection Networks (Chapter 6) References: [1,Wilkenson and Allyn, Ch. 1] [2, Akl, Chapter 2] [3, Quinn, Chapter 2-3]
Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.
Communication operations Efficient Parallel Algorithms COMP308.
Parallel Computing Platforms
(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.
1 Tuesday, September 26, 2006 Wisdom consists of knowing when to avoid perfection. -Horowitz.
Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Interconnection Network Topologies
1 Lecture 24: Parallel Algorithms I Topics: sort and matrix algorithms.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
1 Static Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic.
Introduction to Parallel Processing Ch. 12, Pg
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Lecture 3 Innerconnection Networks for Parallel Computers
An Overview of Parallel Computing. Hardware There are many varieties of parallel computing hardware and many different architectures The original classification.
10.Introduction to Data-Parallel architectures TECH Computer Science SIMD {Single Instruction Multiple Data} 10.1 Introduction 10.2 Connectivity 10.3 Alternative.
Chapter 9: Alternative Architectures In this course, we have concentrated on single processor systems But there are many other breeds of architectures:
Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/
Data Structures and Algorithms in Parallel Computing Lecture 1.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
2016/1/5Part I1 Models of Parallel Processing. 2016/1/5Part I2 Parallel processors come in many different varieties. Thus, we often deal with abstract.
Super computers Parallel Processing
HYPERCUBE ALGORITHMS-1
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.12.1 FAULT TOLERANT SYSTEMS Part 12 - Networks.
Parallel Processing & Distributed Systems Thoai Nam Chapter 3.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
Interconnection Networks Communications Among Processors.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Parallel Architecture
Distributed and Parallel Processing
Multiprocessor Systems
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Interconnection topologies
Course Outline Introduction in algorithms and applications
CS 147 – Parallel Processing
Data Structures and Algorithms in Parallel Computing
Parallel Architectures Based on Parallel Computing, M. J. Quinn
Outline Interconnection networks Processor arrays Multiprocessors
Communication operations
Static Interconnection Networks
Overview Parallel Processing Pipelining
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Static Interconnection Networks
Chapter 2 from ``Introduction to Parallel Computing'',
Presentation transcript:

Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue for network SIMD models The mesh and the hypercube architectures Classification of the PRAM model Matrix multiplication on the EREW PRAM

Advanced Topics in Algorithms and Data Structures Models of parallel computation Parallel computational models can be broadly classified into two categories, Single Instruction Multiple Data (SIMD) Multiple Instruction Multiple Data (MIMD)

Advanced Topics in Algorithms and Data Structures Models of parallel computation SIMD models are used for solving problems which have regular structures. We will mainly study SIMD models in this course. MIMD models are more general and used for solving problems which lack regular structures.

Advanced Topics in Algorithms and Data Structures SIMD models An N- processor SIMD computer has the following characteristics : Each processor can store both program and data in its local memory. Each processor stores an identical copy of the same program in its local memory.

Advanced Topics in Algorithms and Data Structures SIMD models At each clock cycle, each processor executes the same instruction from this program. However, the data are different in different processors. The processors communicate among themselves either through an interconnection network or through a shared memory.

Advanced Topics in Algorithms and Data Structures Design issues for network SIMD models A network SIMD model is a graph. The nodes of the graph are the processors and the edges are the links between the processors. Since each processor solves only a small part of the overall problem, it is necessary that processors communicate with each other while solving the overall problem.

Advanced Topics in Algorithms and Data Structures Design issues for network SIMD models The main design issues for network SIMD models are communication diameter, bisection width, and scalability. We will discuss two most popular network models, mesh and hypercube in this lecture.

Advanced Topics in Algorithms and Data Structures Communication diameter Communication diameter is the diameter of the graph that represents the network model. The diameter of a graph is the longest distance between a pair of nodes. If the diameter for a model is d, the lower bound for any computation on that model is Ω( d ).

Advanced Topics in Algorithms and Data Structures Communication diameter The data can be distributed in such a way that the two furthest nodes may need to communicate.

Advanced Topics in Algorithms and Data Structures Communication diameter Communication between two furthest nodes takes Ω( d ) time steps.

Advanced Topics in Algorithms and Data Structures Bisection width The bisection width of a network model is the number of links to be removed to decompose the graph into two equal parts. If the bisection width is large, more information can be exchanged between the two halves of the graph and hence problems can be solved faster.

Advanced Topics in Algorithms and Data Structures Dividing the graph into two parts. Bisection width

Advanced Topics in Algorithms and Data Structures Scalability A network model must be scalable so that more processors can be easily added when new resources are available. The model should be regular so that each processor has a small number of links incident on it.

Advanced Topics in Algorithms and Data Structures Scalability If the number of links is large for each processor, it is difficult to add new processors as too many new links have to be added. If we want to keep the diameter small, we need more links per processor. If we want our model to be scalable, we need less links per processor.

Advanced Topics in Algorithms and Data Structures Diameter and Scalability The best model in terms of diameter is the complete graph. The diameter is 1. However, if we need to add a new node to an n -processor machine, we need n - 1 new links.

Advanced Topics in Algorithms and Data Structures Diameter and Scalability The best model in terms of scalability is the linear array. We need to add only one link for a new processor. However, the diameter is n for a machine with n processors.

Advanced Topics in Algorithms and Data Structures The mesh architecture Each internal processor of a 2-dimensional mesh is connected to 4 neighbors. When we combine two different meshes, only the processors on the boundary need extra links. Hence it is highly scalable.

Advanced Topics in Algorithms and Data Structures Both the diameter and bisection width of an n -processor, 2-dimensional mesh is A 4 x 4 mesh The mesh architecture

Advanced Topics in Algorithms and Data Structures Hypercubes of 0, 1, 2 and 3 dimensions The hypercube architecture

Advanced Topics in Algorithms and Data Structures Each node of a d -dimensional hypercube is numbered using d bits. Hence, there are 2 d processors in a d -dimensional hypercube. Two nodes are connected by a direct link if their numbers differ only by one bit. The hypercube architecture

Advanced Topics in Algorithms and Data Structures The diameter of a d -dimensional hypercube is d as we need to flip at most d bits (traverse d links) to reach one processor from another. The bisection width of a d -dimensional hypercube is 2 d-1. The hypercube architecture

Advanced Topics in Algorithms and Data Structures The hypercube is a highly scalable architecture. Two d -dimensional hypercubes can be easily combined to form a d+1 -dimensional hypercube. The hypercube has several variants like butterfly, shuffle-exchange network and cube-connected cycles. The hypercube architecture

Advanced Topics in Algorithms and Data Structures Adding n numbers in steps Adding n numbers on the mesh

Advanced Topics in Algorithms and Data Structures Adding n numbers in log n steps Adding n numbers on the hypercube