2016/1/5Part I1 Models of Parallel Processing. 2016/1/5Part I2 Parallel processors come in many different varieties. Thus, we often deal with abstract.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Distributed Systems CS
SE-292 High Performance Computing
Today’s topics Single processors and the Memory Hierarchy
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Multiprocessors CSE 4711 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although.
Parallel Architectures: Topologies Heiko Schröder, 2003.
Parallel Architectures: Topologies Heiko Schröder, 2003.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
2. Multiprocessors Main Structures 2.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.

Chapter 17 Parallel Processing.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
4. Multiprocessors Main Structures 4.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
Computer System Architectures Computer System Software
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
LIGO-G Z 8 June 2001L.S.Finn/LDAS Camp1 How to think about parallel programming.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
1 Introduction to Parallel Computing. 2 Multiprocessor Architectures Message-Passing Architectures –Separate address space for each processor. –Processors.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
A.Broumandnia, 1 4 Models of Parallel Processing Topics in This Chapter 4.1 Development of Early Models 4.2 SIMD versus MIMD Architectures.
Parallel Computer Architecture and Interconnect 1b.1.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
An Overview of Parallel Computing. Hardware There are many varieties of parallel computing hardware and many different architectures The original classification.
Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Chapter 9: Alternative Architectures In this course, we have concentrated on single processor systems But there are many other breeds of architectures:
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Data Structures and Algorithms in Parallel Computing Lecture 1.
Outline Why this subject? What is High Performance Computing?
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Multiprocessor So far, we have spoken at length microprocessors. We will now study the multiprocessor, how they work, what are the specific problems that.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
Background Computer System Architectures Computer System Software.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
These slides are based on the book:
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
Distributed and Parallel Processing
Multiprocessor Systems
CS5102 High Performance Computer Systems Thread-Level Parallelism
CS 147 – Parallel Processing
Data Structures and Algorithms in Parallel Computing
Advanced Computer and Parallel Processing
Presentation transcript:

2016/1/5Part I1 Models of Parallel Processing

2016/1/5Part I2 Parallel processors come in many different varieties. Thus, we often deal with abstract models of real machines.

2016/1/5Part I3 Development of Early Models (1) Associative processing (AP) was perhaps the earliest form of parallel processing. –Associative or content-addressable memories (AMs, CAMs), which allow memory cells to be accessed based on contents rather than their physical locations within the memory array. –AMI AP architectures are essentially based on incorporating simple processing logic into the memory array so as to remove the need for transferring large volumes of data through the limited-bandwidth interface between the memory and the processor (the von Neumann bottleneck)

2016/1/5Part I4 Development of Early Models (2) the AM/AP model has evolved through the incorporation of additional capabilities, so that it is in essence converging with SIMD-type array processors.

2016/1/5Part I5 Development of Early Models (3) neural networks Cellular automata

2016/1/5Part I6

2016/1/5Part I7

2016/1/5Part I8 SIMD Vs. MIMD (1) Most early parallel machines had SIMD designs. Within the SIMD category, two fundamental design choices exist: –Synchronous versus asynchronous SIMD A possible cure is to use the asynchronous version of SIMD, known as SPMD –Custom- versus commodity-chip SIMD

2016/1/5Part I9 SIMD Vs. MIMD (2) In the 1990s, the MIMD paradigm has become more popular recently. MIMD machines are most effective for medium- to coarse- grain parallel applications, where the computation is divided into relatively large subcomputations or tasks whose executions are assigned to the various processors.

2016/1/5Part I10 SIMD Vs. MIMD (3) Within the MIMD class, three fundamental issues or design choices are subjects of ongoing debates in the research community. –MPP-massively or moderately parallel processor Is it more cost-effective to build a parallel processor out of a relatively small number of powerful processors or a massive number of very simple processors –Tightly versus loosely coupled MIMD network of workstations (NOW), cluster computing, Grid Computing –Explicit message passing versus virtual shared memory

2016/1/5Part I11 Global Vs. Distributed Memory (1) Within the MIMD class of paranel processors, memory can be global or distributed. Global memory may be visualized as being in a central location where all processors can access it with equal ease. memory­latency-hiding techniques must be employed. An example of such methods is the use of multithreading.

2016/1/5Part I12

2016/1/5Part I13 Global Vs. Distributed Memory (2) Examples for both the processor-to-memory and processor-to- processor networks include: an abstract model of global-memory computers, known as PRAM. One approach to reducing the amount of data that must pass through the processor-to­memory interconnection network is to use a private cache memory. (locality of data access, cache coherence problem)

2016/1/5Part I14

2016/1/5Part I15 Global Vs. Distributed Memory (3) Distributed-memory architectures can be conceptually viewed as in Fig In addition to the types of interconnection networks enumerated for shared-memory parallel processors, distributed-memory MIMD architectures can also be interconnected by a variety of direct networks. (as nonuniform memory access (NUMA) architectures)

2016/1/5Part I16

2016/1/5Part I17 PRAM Shared-Memory Model (1) The theoretical model used for conventional or sequential computers (SISD class) is known as the random-access machine (RAM) The parallel version of RAM (PRAM), constitutes an abstract model of the class of global-memory parallel processors. The abstraction consists of ignoring the details of the processor-to-memory interconnection network and taking the view that each processor can access any memory location in each machine cycle, independent of what other processors are doing.

2016/1/5Part I18

2016/1/5Part I19 PRAM Shared-Memory Model (2) In the formal PRAM model, a single processor is assumed to be active initially. In each computation step, each active processor can read from and write into the shared memory and can also activate another processor. Even though the global-memory architecture was introduced as a subclass of the MIMD class, the abstract PRAM model depicted in Fig. 4.6 can be SIMD or MIMD.

2016/1/5Part I20

2016/1/5Part I21 PRAM Shared-Memory Model (3) This implies that each instruction cycle would have to consume Ω(log p) real time. The above point is important when we try to compare PRAM algorithms with those for distributed-memory models. An O(log p)-step PRAM algorithm may not be faster than an O(1og 2 p)-step algorithm for a hypercube architecture.

2016/1/5Part I22 Distributed-Memory or Graph Models (1) Given the internal processor and memory structures in each node, a distributed-memory architecture is characterized primarily by the network used to interconnect the nodes. This network is usually represented as a graph. Important parameters of an interconnec­tion network include –Network diameter: the longest of the shortest paths between various pairs of nodes –Bisection (band)width: the smallest number (total capacity) of links that need to be cut in order to divide the network into two subnetworks of half the size. –Vertex or node degree: the number of communication ports required of each node

2016/1/5Part I23

2016/1/5Part I24

2016/1/5Part I25 Distributed-Memory or Graph Models (2) Even though the distributed-memory architecture was introduced as a subclass of the MIMD class, machines based on networks of the type shown in Fig. 4.8 can be SIMD- or MIMD-type. Fig. 4.9 are available for reducing bus traffic by taking advantage of the locality of communication within small clusters of processors.

2016/1/5Part I26