1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab

Slides:



Advertisements
Similar presentations
Parallel Processing with OpenMP
Advertisements

Distributed Systems CS
SE-292 High Performance Computing
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
1 Computational models of the physical world Cortical bone Trabecular bone.
Introductions to Parallel Programming Using OpenMP
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Types of Parallel Computers
Background Computer System Architectures Computer System Software.
1 Introduction to Parallel Computing. 2 Presentation Outline Doing science and engineering using HPC Basic concepts of parallel computing Discussion of.
1 Threads, SMP, and Microkernels Chapter 4. 2 Process: Some Info. Motivation for threads! Two fundamental aspects of a “process”: Resource ownership Scheduling.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
Contemporary Languages in Parallel Computing Raymond Hummel.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Computer System Architectures Computer System Software
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Lecture 2 : Introduction to Multicore Computing
Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Advisor: Dr. Aamir Shafi Co-Advisor: Mr. Ali Sajjad Member: Dr. Hafiz Farooq Member: Mr. Tahir Azim Optimizing N-body Simulations for Multi-core Compute.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
GPU Architecture and Programming
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Parallel Computing.
Outline Why this subject? What is High Performance Computing?
1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Background Computer System Architectures Computer System Software.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
These slides are based on the book:
Chapter 4: Multithreaded Programming
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Introduction to Parallel Processing
Introduction to parallel programming
Constructing a system with multiple computers or processors
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
What is Parallel and Distributed computing?
Chapter 4: Threads.
Guoliang Chen Parallel Computing Guoliang Chen
Lecture 1: Parallel Architecture Intro
Chapter 4: Threads.
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Chapter 4: Threads & Concurrency
Chapter 4 Multiprocessors
Types of Parallel Computers
Presentation transcript:

1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab

2 Serial Computation Traditionally, software has been written for serial computation: To be run on a single computer having a single Central Processing Unit (CPU) A problem is broken into a discrete series of instructions

Parallel Computation Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem: Also known as High Performance Computing (HPC) The prime focus of HPC is performance—the ability to solve biggest possible problems in the least possible time 3

Traditional Usage of Parallel Computing--Scientific Computing Traditionally parallel computing is used to solve challenging scientific problems by doing simulations: For this reason, it is also called “Scientific Computing”: Computational science 4

Emergence of Multi-core Processors In the last decade, performance of processors is not enhanced by increasing clock speed: Increasing clock speed directly increases power consumption Power is dissipated as heat, not practical to cool down processors Intel canceled a project to produce 4 GHz processor! This led to the emergence of multi-core processors: Performance is increased by increasing processing cores that run on lower clock speed: Implies better power usage 5 Disruptive Technology!

6 Moore’s Law is Alive and Well

7 Power Wall

Why Multi-core Processors Consume Lesser Power Dynamic power is proportional to V 2 fC Increasing frequency (f) also increases supply voltage (V): more than linear effect Increasing cores increases capacitance (C) but has only a linear effect 8

9 Software in the Multi-core Era The challenge has been thrown to the software industry: Parallelism is perhaps the answer The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software: Some excerpts: The biggest sea change in software development since the OO revolution is knocking at the door, and its name is Concurrency This essentially means every software programmer will be a parallel programmer: The main motivation behind conducting this “Programming Multicore Processors” workshop

10 About the “Programming Multicore Processors” Workshop

11 Instructors This workshop will be taught by: Akbar Mehdi ( Masters from Stanford University, USA NVIDIA CUDA API, POSIX Threads, Operating Systems, Algorithms Mohsan Jameel ( Masters from KTH, Sweden Scientific Computing, Parallel Computing Languages, OpenMP

12 Course Contents … A little background on Parallel Computing Approaches

Parallel Hardware Three main classifications: Shared Memory Multi-processors: Symmetric Multi-Processors (SMP) Multi-core Processors Distributed Memory Multi-processors Massively Parallel Processors (MPP) Clusters: Commodity and custom clusters Hybrid Multi-processors: Mixture of shared and distributed memory technologies 13

14 First Type: Shared Memory Multi-processors All processors have access to shared memory: Notion of “Global Address Space”

15 Symmetric Multi-Processors (SMP) A SMP is a parallel processing system with a shared- everything approach: The term signifies that each processor shares the main memory and possibly the cache Typically a SMP can have 2 to 256 processors Also called Uniform Memory Access (UMA) Examples include AMD Athlon, AMD Opteron 200 and 2000 series, Intel XEON etc

16 Multi-core Processors

17 Second Type: Distributed Memory Each processor has its own local memory Processors communicate with each other by message passing on an interconnect

18 Cluster Computers A group of PCs or workstations or Macs (called nodes) connected to each other via a fast (and private) interconnect: Each node is an independent computer Each cluster has one head-node and multiple compute-nodes: Users logon to head-node and start parallel jobs on compute-nodes Two popular cluster classifications: Beowulf Clusters ( Rocks Clusters (

19 Proc 6 Proc 0 Proc 1 Proc 3 Proc 2 Proc 4 Proc 5 Proc 7 Cluster Computer

20 Third Type: Hybrid Modern clusters have hybrid architecture: Distributed memory for inter-node (between nodes) communications Shared memory for intra-node (within a node) communications

21 SMP and Multi-core clusters Most modern commodity clusters have SMP and/or multi-core nodes: Processors not only communicate via interconnect, but shared memory programming is also required This trend is likely to continue: Even a new name “constellations” has been proposed

Classification of Parallel Computers 22 Parallel Hardware Shared Memory Hardware Distributed Memory Hardware SMPs Multicore Processors Clusters MPPs In this workshop, we will learn how to program shared memory parallel hardware … Parallel Hardware  Shared Memory Hardware  *

Writing Parallel Software There are mainly two approaches for writing parallel software The first approach is to use libraries (packages) written in already existing languages: Economical The second and more radical approach is to provide new languages: Parallel Computing has a history of novel parallel languages These languages provide high level parallelism constructs: 23

24 Shared Memory Languages and Libraries Designed to support parallel programming on shared memory platforms: OpenMP: Consists of a set of compiler directives, library routines, and environment variables The runtime uses fork-join model of parallel execution Cilk++: A design goal was to support asynchronous parallelism A set of keywords: cilk_for, cilk_spawn, cilk_sync … POSIX Threads (PThreads) Threads Building Blocks (TBB)

Distributed Memory Languages and Libraries Libraries: Message Passing Interface (MPI)—defacto standard PVM Languages: High Performance Fortran (HPF): Fortran M: HPJava: 25

26 Our Focus Shared Memory and Multi-core Processors Machines: Using POSIX Threads Using OpenMP Using Cilk++ (covered briefly) Disruptive Technology: Using Graphics Processing Units (GPUs) by NVIDIA for general-purpose computing

Day One 27 TimingsTopicPresenter 10:00 to 10:30 Introduction to multicore computing Aamir Shafi 10:30 to 11:30 Background discussion— review of processes, threads, and architecture. Speedup analysis Akbar Mehdi 11:30 to 11:45Break 11:45 to 12:55P Introduction to POSIX Threads Akbar Mehdi 12:55P to 1:25PPrayers break 1:25P to 2:30PPractical Session—Run hello world PThreads program, introduce Linux, top, Solaris. Also introduce the first coding assignment Akbar Mehdi

Day Two 28 TimingsTopicPresenter 10:00 to 11:00 POSIX Threads continued… Akbar Mehdi 11:00 to 12:55P Introduction to OpenMP Mohsan Jameel 12:55P to 1:25PPrayer Break 1:25P to 2:30POpenMP continued… + Lab session Mohsan Jameel

Day Three 29 TimingsTopicPresenter 10:00 to 12:00 Parallelizing the Image Processing Application using PThreads and OpenMP—Practical Session Akbar Mehdi and Mohsan Jameel 12:00 to 12:55PIntroduction to Intel Cilk++Aamir Shafi 12:55 to 1:25PPrayer Break 1:25P to 2:30P Introduction to NVIDIA CUDA Akbar Mehdi 2:30P to 2:35PConcluding RemarksAamir Shafi

Learning Objectives To become aware of the multicore revolution and its impact on the computer software industry To program multicore processors using POSIX Threads To program multicore processors using OpenMP and Cilk++ To program Graphics Processing Units (GPUs) for general purpose computation (using NVIDIA CUDA API) 30 You may download the tentative agenda from

Next Session Review of important and relevant Operating Systems and Computer Architecture concepts by Akbar Mehdi …. 31