Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Today’s topics Single processors and the Memory Hierarchy
CS 213: Parallel Processing Architectures Laxmi Narayan Bhuyan Lecture3.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Computer Architecture Introduction to MIMD architectures Ola Flygt Växjö University
Multiprocessors CSE 4711 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although.
Introduction to MIMD architectures
1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462.
Lecture 18: Multiprocessors
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Multiprocessors ELEC 6200 Computer Architecture and Design Instructor: Dr. Agrawal Yu-Chun Chen 10/27/06.
Chapter 17 Parallel Processing.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
ENGS 116 Lecture 151 Multiprocessors and Thread-Level Parallelism Vincent Berk November 12 th, 2008 Reading for Friday: Sections Reading for Monday:
Parallel Processing Architectures Laxmi Narayan Bhuyan
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
EECC756 - Shaaban #1 lec # 1 Spring Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
Computer architecture II
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Parallel Architectures
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
1 Lecture 2 Parallel Programming Platforms Parallel Computing Spring 2015.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Department of Computer Science University of the West Indies.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
MBG 1 CIS501, Fall 99 Lecture 23: Intro to Multi-processors Michael B. Greenwald Computer Architecture CIS 501 Fall 1999.
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Multiprocessor So far, we have spoken at length microprocessors. We will now study the multiprocessor, how they work, what are the specific problems that.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
Diversity and Convergence of Parallel Architectures Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals.
Parallel Computing Presented by Justin Reschke
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
740: Computer Architecture Programming Models and Architectures Prof. Onur Mutlu Carnegie Mellon University.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
COMP 740: Computer Architecture and Implementation
CMSC 611: Advanced Computer Architecture
CS5102 High Performance Computer Systems Thread-Level Parallelism
Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”
Course Outline Introduction in algorithms and applications
CS 147 – Parallel Processing
Multi-Processing in High Performance Computer Architecture:
CMSC 611: Advanced Computer Architecture
Multiprocessors - Flynn’s taxonomy (1966)
CS 213: Parallel Processing Architectures
Parallel Processing Architectures
Parallel Computer Architecture
Chapter 4 Multiprocessors
Presentation transcript:

Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel identify & synchronize different asynchronous threads data parallel same operation on different parts of the shared data space

Spring 2003CSE P5482 Issues in Multiprocessors How to express parallelism automatic (compiler) detection implicitly parallel C & Fortran programs, e.g., SUIF compiler language support HPF, ZPL runtime library constructs coarse-grain, explicitly parallel C programs Algorithm development embarrassingly parallel programs could be easily parallelized development of different algorithms for same problem

Spring 2003CSE P5483 Issues in Multiprocessors How to get good parallel performance recognize parallelism transform programs to increase parallelism without decreasing processor locality decrease sharing costs

Spring 2003CSE P5484 Flynn Classification SISD: single instruction stream, single data stream single-context uniprocessors SIMD: single instruction stream, multiple data streams exploits data parallelism example: Thinking Machines CM-1 MISD: multiple instruction streams, single data stream systolic arrays example: Intel iWarp MIMD: multiple instruction streams, multiple data streams multiprocessors multithreaded processors relies on control parallelism: execute & synchronize different asynchronous threads of control parallel programming & multiprogramming example: most processor companies have MP configurations

Spring 2003CSE P5485 CM-1

Spring 2003CSE P5486 Systolic Array

Spring 2003CSE P5487 MIMD Low-end bus-based simple, but a bottleneck simple cache coherency protocol physically centralized memory uniform memory access (UMA machine) Sequent Symmetry, SPARCCenter, Alpha-, PowerPC- or SPARC- based servers

Spring 2003CSE P5488 Low-end MP

Spring 2003CSE P5489 MIMD High-end higher bandwidth, multiple-path interconnect more scalable more complex cache coherency protocol (if shared memory) longer latencies physically distributed memory non-uniform memory access (NUMA machine) could have processor clusters SGI Challenge, Convex Examplar, Cray T3D, IBM SP-2, Intel Paragon

Spring 2003CSE P54810 High-end MP

Spring 2003CSE P54811 MIMD Programming Models Address space organization for physically distributed memory distributed shared memory 1 (logical) global address space multicomputers private address space/processor Inter-processor communication shared memory accessed via load/store instructions SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2 message passing explicit communication by sending/receiving messages TMC CM-5, Intel Paragon, IBM SP-2

Spring 2003CSE P54812 Shared Memory vs. Message Passing Shared memory + simple parallel programming model global shared address space not worry about data locality but get better performance when program for data placement lower latency when data is local less communication when avoid false sharing but can do data placement if it is crucial, but don’t have to hardware maintains data coherence synchronize to order processor’s accesses to shared data like uniprocessor code so parallelizing by programmer or compiler is easier  can focus on program semantics, not interprocessor communication

Spring 2003CSE P54813 Shared Memory vs. Message Passing Shared memory + low latency (no message passing software) but overlap of communication & computation latency-hiding techniques can be applied to message passing machines + higher bandwidth for small transfers but usually the only choice

Spring 2003CSE P54814 Shared Memory vs. Message Passing Message passing + abstraction in the programming model encapsulates the communication costs but more complex programming model additional language constructs need to program for nearest neighbor communication + no coherency hardware + good throughput on large transfers but what about small transfers? + more scalable (memory latency doesn’t scale with the number of processors) but large-scale SM has distributed memory also hah! so you’re going to adopt the message-passing model?

Spring 2003CSE P54815 Shared Memory vs. Message Passing Why there was a debate little experimental data not separate implementation from programming model can emulate one paradigm with the other MP on SM machine message buffers in private (to each processor) memory copy messages by ld/st between buffers SM on MP machine ld/st becomes a message copy sloooooooooow Who won?