CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.

Slides:



Advertisements
Similar presentations
© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Parallel computer architecture classification
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Chapter 17 Parallel Processing.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
CIS 629 Parallel Arch. Intro Parallel Computer Architecture Slides blended from those of David Patterson, CS 252 and David Culler, CS 258 UC Berkeley.
ENGS 116 Lecture 151 Multiprocessors and Thread-Level Parallelism Vincent Berk November 12 th, 2008 Reading for Friday: Sections Reading for Monday:
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Parallel Processing Architectures Laxmi Narayan Bhuyan
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.
Computer Architecture Parallel Processing
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
CSIE30300 Computer Architecture Unit 15: Multiprocessors Hsin-Chou Chi [Adapted from material by and
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
Parallel Computing.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Parallel Computing Presented by Justin Reschke
CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Winter-Spring 2001Codesign of Embedded Systems1 Essential Issues in Codesign: Architectures Part of HW/SW Codesign of Embedded Systems Course (CE )
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Feeding Parallel Machines – Any Silver Bullets? Novica Nosović ETF Sarajevo 8th Workshop “Software Engineering Education and Reverse Engineering” Durres,
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
COMP 740: Computer Architecture and Implementation
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Advanced Architectures
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
18-447: Computer Architecture Lecture 30B: Multiprocessors
CMSC 611: Advanced Computer Architecture
Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”
Parallel Processing - introduction
Course Outline Introduction in algorithms and applications
Flynn’s Classification Of Computer Architectures
Laxmi Narayan Bhuyan SIMD Architectures Laxmi Narayan Bhuyan
CMSC 611: Advanced Computer Architecture
What is Parallel and Distributed computing?
Chapter 17 Parallel Processing
Parallel Processing Architectures
Chapter 4 Multiprocessors
Presentation transcript:

CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis

Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.” –Almasi and Gottlieb, Highly Parallel Computing,1989 Parallel machines have become increasingly important since: –Power & heat problems grow for big/fast processors –Even with double transistors, how do you use them to make a single processor faster? –CPUs are being developed with an increasing number of cores

Questions about parallel computers: How large a collection? How powerful are processing elements? How do they cooperate and communicate? How are data transmitted? What type of interconnection? What are HW and SW primitives for programmers? Does it translate into performance?

Level of Parallelism Bit-level parallelism –ALU parallelism: 1-bit, 4-bits, 8-bit,... Instruction-level parallelism (ILP) –Pipelining, Superscalar, VLIW, Out-of-Order execution Process/Thread-level parallelism –Divide job into parallel tasks Job-level parallelism –Independent jobs on one computer system

App Perf (GFLOPS) Memory (GB) 48 hour weather hour weather31 Pharmaceutical design Global Change, Genome 1000 Applications Scientific Computing –Nearly Unlimited Demand (Grand Challenge): –Successes in some real industries: Petroleum: reservoir modeling Automotive: crash simulation, drag analysis, engine Aeronautics: airflow analysis, engine, structural mechanics Pharmaceuticals: molecular modeling

Commercial Applications Transaction processing File servers Electronic CAD simulation Large WWW servers WWW search engines Graphics –Graphics hardware –Render Farms

Framework Extend traditional computer architecture with a communication architecture –abstractions (HW/SW interface) –organizational structure to realize abstraction efficiently Programming Model: –Multiprogramming: lots of jobs, no communication –Shared address space: communicate via memory –Message passing: send and receive messages –Data Parallel: operate on several data sets simultaneously and then exchange information globally and simultaneously (shared or message)

Communication Abstraction Shared address space: –e.g., load, store, atomic swap Message passing: –e.g., send, receive library calls Debate over this topic (ease of programming, scaling) –many hardware designs 1:1 programming model

Taxonomy of Parallel Architecture Flynn Categories –SISD (Single Instruction Single Data) –MIMD (Multiple Instruction Multiple Data) –MISD (Multiple Instruction Single Data) –SIMD (Single Instruction Multiple Data)

SISD Uniprocessor

MIMD Multiple communicating processes

MISD No commercial examples Different operations to a single data set –Find primes –Crack passwords

SIMD Vector/Array computers

SIMD Arrays Performance keys –Utilization –Communication

Data Parallel Model Operations performed in parallel on each element of a large regular data structure, such as an array –One Control Processor broadcast to many processing elements (PE) with condition flag per PE so that can skip For distributed memory architecture data is distributed among memories –Data parallel model requires fast global synchronization –Data parallel programming languages lay out data to processor –Vector processors have similar ISAs, but no data placement restriction

SIMD Utilization Conditional Execution –PE Enable if (f<.5) {...} –Global enable check while (t > 0) {...}

Communication: MasPar MP1 Fast local X-net Slow global routing

Communication: CM2 Hypercube local routing Wormhole global routing

Communication: PixelFlow Dense connections within block –Single swizzle operation collects one word from each PE in block Designed for antialiasing –NO inter-block connection –NO global routing

Data Parallel Languages SIMD programming –PE point of view –Data: shared or per-PE What data is distributed? What is shared over PE subset What data is broadcast with instruction stream? –Data layout: shape [256][256]d; –Communication primitives –Higher-level operations Scan/Prefix sum: [i]r = ∑ j≤i [j]d –1,1,2,3,4 → 1,1+1=2,2+2=4,4+3=7,7+4=11

Single Program Multiple Data Many problems do not map well to SIMD –Better utilization from MIMD or ILP Data parallel model ⇒ Single Program Multiple Data (SPMD) model –All processors execute identical program –Same program for SIMD, SISD or MIMD –Compiler handles mapping to architecture