Computer Architecture Introduction to MIMD architectures Ola Flygt Växjö University +46 470 70 86.

Slides:

Advertisements

Similar presentations

SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

Advertisements

CS 213: Parallel Processing Architectures Laxmi Narayan Bhuyan Lecture3.

Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.

Multiple Processor Systems

The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.

Introduction to MIMD architectures

1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462.

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Bugnion et al. Presented by: Ahmed Wafa.

1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.

G Robert Grimm New York University Disco.

11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.

User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.

Multiprocessors Andreas Klappenecker CPSC321 Computer Architecture.

An Introduction to Parallel Computing Dr. David Cronk Innovative Computing Lab University of Tennessee Distribution A: Approved for public release; distribution.

Chapter 17 Parallel Processing.

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.

1 CSE SUNY New Paltz Chapter Nine Multiprocessors.

 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

DDM - A Cache-Only Memory Architecture Erik Hagersten, Anders Landlin and Seif Haridi Presented by Narayanan Sundaram 03/31/2008 1CS258 - Parallel Computer.

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.

Computer Architecture Parallel Processing

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.

Fast Multi-Threading on Shared Memory Multi-Processors Joseph Cordina B.Sc. Computer Science and Physics Year IV.

Parallel Computer Architecture and Interconnect 1b.1.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

MIMD Distributed Memory Architectures message-passing multicomputers.

Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.

1 Parallel Applications Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002.

Definitions Speed-up Efficiency Cost Diameter Dilation Deadlock Embedding Scalability Big Oh notation Latency Hiding Termination problem Bernstein’s conditions.

Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.

PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.

Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.

Outline Why this subject? What is High Performance Computing?

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.

Multiprocessor So far, we have spoken at length microprocessors. We will now study the multiprocessor, how they work, what are the specific problems that.

Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.

Distributed Computing Systems CSCI 6900/4900. Review Definition & characteristics of distributed systems Distributed system organization Design goals.

Background Computer System Architectures Computer System Software.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.

Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)

CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.

EE 382 Processor DesignWinter 98/99Michael Flynn 1 EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.

Overview Parallel Processing Pipelining

Introduction to parallel programming

CS5102 High Performance Computer Systems Thread-Level Parallelism

Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA

CS 147 – Parallel Processing

CMSC 611: Advanced Computer Architecture

MIMD Multiple instruction, multiple data

Parallel and Multiprocessor Architectures – Shared Memory

Parallel Architectures Based on Parallel Computing, M. J. Quinn

Introduction to Multiprocessors

High Performance Computing

CSL718 : Multiprocessors 13th April, 2006 Introduction

Presentation transcript:

Computer Architecture Introduction to MIMD architectures Ola Flygt Växjö University

Outline  {Multi-processor}  {Multi-computer}  15.1 Architectural concepts  15.2 Problems of scalable computers  15.3 Main design issues of scalable MIMD computers CH01

Multi-computer: Structure of Distributed Memory MIMD Architectures

Multi-computer (distributed memory system): Advantages and Disadvantages + Highly Scalable + Message passing solves memory access synchronization problem - Load balancing problem - Deadlock in message passing - Need to physically copying data between processes

Multi-processor: Structure of Shared Memory MIMD Architectures

Multi-processor (shared memory system): Advantages and Disadvantages + No need to partition data or program, uniprocessor programming techniques can be adapted + Communication between processor is efficient - Synchronized access to share data in memory needed. Synchronising constructs (semaphores, conditional critical regions, monitors) result in nondeterministc behaviour which can lead programming errors that are difficult to discover - Lack of scalability due to (memory) contention problem

Best of Both Worlds: Multicomputer using virtual shared memory  Also called distributed shared memory architecture  The local memories of multi-computer are components of global address space:  any processor can access the local memory of any other processor  Three approaches:  Non-uniform memory access (NUMA) machines  Cache-only memory access (COMA) machines  Cache-coherent non-uniform memory access (CC-NUMA) machines

Structure of NUMA Architectures

NUMA  Logically shared memory is physically distributed  Different access of local and remote memory blocks. Remote access takes much more time – latency  Sensitive to data and program distribution  Close to distributed memory systems, yet the programming paradigm is different  Example: Cray T3D

NUMA: remote load

Structure of COMA Architectures

COMA  Each block of the shared memory works as local cache of a processor  Continuous, dynamic migration of data  Hit-rate decreases the traffic on the Interconnection Network  Solutions for data-consistency increase the same traffic (see cache coherency problem later)  Examples: KSR-1, DDM

Structure of CC-NUMA Architectures

CC-NUMA  A combination of NUMA and COMA  Initially static data distribution, then dynamic data migration  Cache coherency problem is to be solved  COMA and CC-NUMA are used in newer generation of parallel computers  Examples: Convex SPP1000, Stanford DASH, MIT Alewife

Classification of MIMD computers

Problems and solutions  Problems of scalable computers 1. Tolerate and hide latency of remote loads 2. Tolerate and hide idling due to synchronization  Solutions 1. Cache memory  problem of cache coherence 2. Prefetching 3. Threads and fast context switching