Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Today’s topics Single processors and the Memory Hierarchy
Parallel Programming Yang Xianchun Department of Computer Science and Technology Nanjing University Introduction.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Multiple Processor Systems
Distributed Processing, Client/Server, and Clusters
History of Distributed Systems Joseph Cordina
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
2. Multiprocessors Main Structures 2.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Chapter 17 Parallel Processing.
12/1/2005Comp 120 Fall December Three Classes to Go! Questions? Multiprocessors and Parallel Computers –Slides stolen from Leonard McMillan.
Chapter 7 Multicores, Multiprocessors, and Clusters.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
4. Multiprocessors Main Structures 4.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture FIT5174 Distributed & Parallel Systems Lecture 11 Unit Summary and Test Preparation.
Mapping Techniques for Load Balancing
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
Advanced Computer Architectures
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
Computer Architecture Parallel Processing
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi.
1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Department of Computer Science University of the West Indies.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Computer System Architecture Dept. of Info. Of Computer. Chap. 13 Multiprocessors 13-1 Chap. 13 Multiprocessors n 13-1 Characteristics of Multiprocessors.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
CS591x -Cluster Computing and Parallel Programming
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Lec 6 Chap. 13Multiprocessors
Coupling Facility. The S/390 Coupling Facility (CF), the key component of the Parallel Sysplex cluster, enables multisystem coordination and datasharing.
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Parallel Computing Presented by Justin Reschke
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Computer Architecture Lecture 27: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015.
Intro to Parallel and Distributed Processing Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
These slides are based on the book:
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Introduction to Parallel Processing
18-447: Computer Architecture Lecture 30B: Multiprocessors
CS5102 High Performance Computer Systems Thread-Level Parallelism
Parallel and Multiprocessor Architectures
Different Architectures
Symmetric Multiprocessing (SMP)
Coe818 Advanced Computer Architecture
Mattan Erez The University of Texas at Austin
Presentation transcript:

Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour.

Next up: TIPS Mega- = 106, Giga- = 109, Tera- = 1012, Peta- = 1015 BOPS, anyone? Light travels about 1 ft / 10-9 secs in free space. A Tera-Hertz uniprocessor could have no clock-to-clock path longer than 300 microns… We already know of problems that require greater than a TIP (Simulations of weather, weapons, brains)

Solution: Parallelism Pipelining – reasonable for a small number of stages (5-10), after that bypassing and stalls become unmanageable. Superscalar – replicate data paths and design control logic to discover parallelism in traditional programs. Explicit parallelism – must learn how to write programs that run on multiple CPUs.

Pipelining

Superscalar – How far can it go? Multiple functional units (ALUs, Addr, Floating point, etc.) Instruction dispatch Dynamic scheduling Pipelines Speculative execution

Explicit Parallelism Distributed Parallel Transaction-oriented Geographically dispersed locations E.g. SETI@home Parallel Single goal computing Computing intense and/or data-intense High-speed data exchange Often on custom hardware E.g. Geochemical surveys

Challenges For distributed processing, parallelism is given and usually cannot easily change. Programming is relatively easy. For parallel processing, the programmer defines parallelism by partitioning the serial program(s). Parallel programming in general is more difficult than transaction applications.

Other vocabulary Decomposition Course-grain Fine-grain The way that a program can be broken up for parallel processing Course-grain Breaks into big chunks (fewer processors) SMP Distributed (often) Fine-grain Breaks into small chunks (more processors) Image processing

Inter-processor communications Loosely-coupled Tightly-coupled Custom supercomputers Distributed processors Beowulf clusters

More Terminology SIMD (Single Instruction Multiple Data) MIMD (Multiple Instruction Multiple Data) MISD (Pipeline)

SIMD Same instruction executed in multiple units, on different data I Examples: Vector processors, AltiVec D1 I D2 I D3 I D4 I

MIMD Each unit does own instruction on own text Examples: Mercury, Beowulf, etc. I1 I2 I3 I4 D1 D2 D3 D4

MISD (pipeline) D4 D3 D2 D1 I1 I2 I3 I4

Distributed Programming Tools C/C++ with TCP/IP Perl with TCP/IP Java Corba ASP .Net

Parallel Programming Tools PVM MPI Synergy Others (proprietary hardware)

Parallel Programming Difficulties Program partition and allocation Data partition and allocation Program(process) synchronization Data access mutual exclusion Dependencies Process(or) failures Scalability…

Software techniques Shared Memory Buffers — Areas of memory that any node can read or write Sockets — Provide full-duplex message passing between processes. Semaphores and Spinlocks — Provide locking and synchronization functions Mailbox Interrupts — Provide an interrupt-driven communication mechanism Direct Memory Access — Provides asynchronous shared memory bufferI/O.

Hardware configurations – Interconnects and Memory

Interconnects

Crossbar

Mesh

Interconnects

What it really looks like Note: this computer would rank well on www.top500.org

Summary Prospects for future CPU architectures: Pipelining - Well understood, but mined-out Superscalar - Nearing its practical limits SIMD - Limited use for special applications VLIW - Returns controls to S/W. The future? Prospects for future Computer System architectures: SMP - Limited scalability. Harder than it appears. MIMD/message-passing - It’s been the future for over 20 years now. How to program?