Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.

Slides:



Advertisements
Similar presentations
CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Advertisements

Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Today’s topics Single processors and the Memory Hierarchy
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Classification of Distributed Systems Properties of Distributed Systems n motivation: advantages of distributed systems n classification l architecture.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
History of Distributed Systems Joseph Cordina
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
Overview Efficient Parallel Algorithms COMP308. COMP 308 Exam Time allowed : 2.5 hours Answer four questions (out of six). If you attempt to answer more.
Models of Parallel Computation
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Parallel Computing Overview CS 524 – High-Performance Computing.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Lecture 1 – Parallel Programming Primer CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed.
Parallel Architectures
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Computer System Architectures Computer System Software
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
1 Parallel Computing Basics of Parallel Computers Shared Memory SMP / NUMA Architectures Message Passing Clusters.
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Bulk Synchronous Parallel Processing Model Jamie Perkins.
Parallel ICA Algorithm and Modeling Hongtao Du March 25, 2004.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Department of Computer Science University of the West Indies.
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
RAM, PRAM, and LogP models
LogP and BSP models. LogP model Common MPP organization: complete machine connected by a network. LogP attempts to capture the characteristics of such.
Bulk Synchronous Processing (BSP) Model Course: CSC 8350 Instructor: Dr. Sushil Prasad Presented by: Chris Moultrie.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
Parallel Processing & Distributed Systems Thoai Nam Chapter 2.
Distributed Programming CA107 Topics in Computing Series Martin Crane Karl Podesta.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Data Structures and Algorithms in Parallel Computing Lecture 1.
2016/1/5Part I1 Models of Parallel Processing. 2016/1/5Part I2 Parallel processors come in many different varieties. Thus, we often deal with abstract.
Outline Why this subject? What is High Performance Computing?
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
Background Computer System Architectures Computer System Software.
Primitive Concepts of Distributed Systems Chapter 1.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Classification of parallel computers Limitations of parallel processing.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
These slides are based on the book:
18-447: Computer Architecture Lecture 30B: Multiprocessors
Distributed and Parallel Processing
CS5102 High Performance Computer Systems Thread-Level Parallelism
Course Outline Introduction in algorithms and applications
Parallel computation models
Guoliang Chen Parallel Computing Guoliang Chen
Data Structures and Algorithms in Parallel Computing
Chapter 17: Database System Architectures
Presentation transcript:

Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Parallel Processing Super Computer Parallel Computer Amdahl’s Low, Speedup, Efficiency Parallel Machine Architecture Computational Model Concurrency Approach Parallel Programming Cluster Computing Lecture organization

It is the division of work into smaller tasks Assigning many smaller tasks to multiple workers to work on simultaneously Parallel processing is the use of multiple processors to execute different parts of the same program simultaneously Difficulties: coordinating, controlling and monitoring the workers The main goals of parallel processing are: -solve much bigger problems much faster! to reduce wall-clock time of execution of computer programs to increase the size of computational problems that can be solved What is Parallel Processing?

What is a Supercomputer? A supercomputer is a computer that is a lot faster than the computers that normal people use Note: This is a time-dependent definition Manufacturer Computer/Procs R max R peak Installation Site Country/Year TMC CM-5/1024/ Los Alamos National Laboratory USA/ June 1993: TOP500 Lists Supercomputer & parallel computer

June 2003: Manufacturer Computer/Procs R max R peak Installation Site Country/Year NEC Earth-Simulator/ Earth simulator center Japan R max Maximal LINPACK performance achieved R peak Theoretical peak performance LINPACK is a Benchmark

Amdahl’s Law Amdahl’s low, Speedup, Efficiency

Efficiency is a measure of the fraction of time that a processor spends performing useful work. Efficiency

Shunt Operation

SIMD MIMD MISD Clusters Parallel and Distributed Computers

SIMD (Single Instruction Multiple Data)

MISD(Multi Instruction Single Data)

MIMD (Multiple Instruction Multiple Data)

MIMD(cont.)

Shared memory model Bus-based Switch-based NUMA Distributed memory model Distributed shared memory model Page-based Object-based Hardware Parallel machine architecture

Shared memory model

- Shared memory or Multiprocessor -OpenMP is a standard (C/C++/FORTRAN) Advantage: Easy Programming. Disadvantage: Design Complexity Not Scalable Shared memory model(cont.)

-Bus is bottleneck - Not scalable Bus-based shared memory model

- Maintenance is difficult. - Expensive - scalable Switch-based shared memory model

NUMA stands for Non-Uniform Memory Access. Simulated shared memory Better scalability NUMA model

Multi computer MPI(Message Passing Interface) Easy design Low cost High scalability Difficult programming Distributed memory model

Linear Array Ring Mesh Fully Connected Examples of Network Topology

S d = 4 Hypercubes Examples of Network Topology(cont.)

Simpler abstraction Sharing data easier portability Easy design with easy programming Low performance(for high communication) Distributed shared memory model

Degree of Coupling SIMDMIMD Shared Memory Distributed Memory Supported Grain Sizes Communication Speed slowfast fine coarse loose tight SIMDSMPNUMACluster Parallel and Distributed Architecture (Leopold, 2001)

RAM PRAM BSP LOGP MPI Computational Model

RAM Model

Synchronized Read Compute Write Cycle EREW ERCW CREW CRCW Control Private Memory P1P1 Private Memory P2P2 Private Memory PpPp Global Memory Parallel Random Access Machine PRAM Model

Generalization of PRAM Model Processor- Memory Pairs Communication Network Barrier Synchronization Super-step Processes Execute Communications Barrier Synchronization Bulk Synchronous Parallel (BSP) Model

Cost of superstep = w+max(hs,hr).g+l –w (maximum number of local operation) –hs (maximum # of packets sent) –hr (maximum # of packets received) g (communication throughput) p (number of Processors) l (synchronization latency) BSP Space Complexity

Closely related to BSP It models asynchronous execution News Parameters L (message latency) o The overhead, defined as the length of time that a processor is engaged in the transmission or reception of each message. During this time the processor cannot perform other operations. g : The gap, defined as the minimum time interval between consecutive message transmissions or receptions. The reciprocal of g corresponds to the available per-processor bandwidth P: The number of processor/memory modules. LogP Model

Logp (cont.)

What Is MPI? A message-passing library specification message-passing model not a compiler specification not a specific product For parallel computers, clusters, and heterogeneous networks Full-featured Designed to permit (unleash?) the development of parallel software libraries Designed to provide access to advanced parallel hardware for end users library writers tool developers MPI(Message Passing Interface)

Application MPI Comm. Application MPI Comm. Node 1 Node 2 Task 1Task 2 Virtual communication Real communication MPI Layer

Matrix Multiplication Example

PRAM Matrix Multiplication Cost Of PRAM Algorithm

BSP Matrix Multiplication Cost of algorithm

Concurrency Approach Control Parallel Data Parallel

Control Parallel

Data Parallel

The Best granularity for programming

Explicit Parallel Programming Occam, MPI, PVM Implicit Parallel Programming Parallel functional programming ML,… Concurrent object-oriented programming COOL,… Data parallel programming Fortran 90, HPF,… Parallel Programming

A Cluster system is –Parallel multicomputer built from high-end PCs and conventional high-speed network. –Support parallel programming Cluster Computing

Scientific Computing –Simulation, CFD, CAD/CAM, Weather prediction, process large volume of data Super server system –Scalable internet/ web server –Database server –Multimedia, video, audio server Applications Cluster Computing(cont.)

Cluster System Building Block High Speed Network HW OS Single System Image Layer System Tool Layer Application Layer Cluster Computing(cont.)

Why cluster computing? Scalability –Build small system first, grow it later. Low-cost –Hardware based on COTS model (Component off-the-shelf) –S/w(SoftWare) based on freeware from research community Easier to maintain Vendor independent Cluster Computing(cont.)

The End