Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.

Slides:



Advertisements
Similar presentations
Distributed Systems CS
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Computer Abstractions and Technology
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
NSF/TCPP Early Adopter Experience at Jackson State University Computer Science Department.
Introductory Courses in High Performance Computing at Illinois David Padua.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
Multiprocessors CSE 4711 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although.
Summary Background –Why do we need parallel processing? Applications Introduction in algorithms and applications –Methodology to develop efficient parallel.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
ECE669 L11: Static Routing Architectures March 4, 2004 ECE 669 Parallel Computer Architecture Lecture 11 Static Routing Architectures.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Parallel Computer Architectures
OPL: Our Pattern Language. Background Design Patterns: Elements of Reusable Object-Oriented Software o Introduced patterns o Very influential book Pattern.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
Leveling the Field for Multicore Open Systems Architectures Markus Levy President, EEMBC President, Multicore Association.
Jawwad A Shamsi Nouman Durrani Nadeem Kafi Systems Research Laboratories, FAST National University of Computer and Emerging Sciences, Karachi Novelties.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications Published in: Cluster.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Master Program (Laurea Magistrale) in Computer Science and Networking High Performance Computing Systems and Enabling Platforms Marco Vanneschi 1. Prerequisites.
Definitions Speed-up Efficiency Cost Diameter Dilation Deadlock Embedding Scalability Big Oh notation Latency Hiding Termination problem Bernstein’s conditions.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
Early Adopter: Integration of Parallel Topics into the Undergraduate CS Curriculum at Calvin College Joel C. Adams Chair, Department of Computer Science.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Parallel programs Inf-2202 Concurrent and Data-intensive Programming Fall 2016 Lars Ailo Bongo
CS203 – Advanced Computer Architecture
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
18-447: Computer Architecture Lecture 30B: Multiprocessors
EECE571R -- Harnessing Massively Parallel Processors ece
CS5102 High Performance Computer Systems Thread-Level Parallelism
Course Outline Introduction in algorithms and applications
Multi-Processing in High Performance Computer Architecture:
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
EE 193: Parallel Computing
Multi-Processing in High Performance Computer Architecture:
Guoliang Chen Parallel Computing Guoliang Chen
Summary Background Introduction in algorithms and applications
Course Outline Introduction in algorithms and applications
Chapter 4 Multiprocessors
Chapter 01: Introduction
Vrije Universiteit Amsterdam
Presentation transcript:

Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop efficient parallel (distributed-memory) algorithms –Understand various forms of overhead (communication, load imbalance, search overhead, synchronization) –Understand various distributions (blockwise, cyclic) –Understand various load balancing strategies (static, dynamic master/worker model) –Understand correctness problems (e.g. message ordering)

Summary Parallel machines and architectures –Processor organizations, topologies, criteria –Types of parallel machines arrays/vectors, shared-memory, distributed memory –Routing –Flynn’s taxonomy –What are cluster computers? –What networks do real machines (like the Blue Gene) use? –Speedup, efficiency (+ their implications), Amdahl’s law

Summary Programming methods, languages, and environments –Different forms of message passing naming, explicit/implicit receive, synchronous/asynchronous sending –Select statement –SR primitives (not syntax) –MPI: message passing primitives, collective communication –Java parallel programming model and primitives –HPF: problems with automatic parallelization; division of work between programmer and HPF compiler; alignment/distribution primitives; performance implications

Summary Applications –N-body problems: load balancing and communication (locality) optimizations, costzones, performance comparison –Search algorithm (TDS): use asynchronous communication + clever (transposition-driven) scheduling

Summary Many different types of many-core hardware – Understand how to analyze it Hardware performance metrics: theoretical peak performance; memory bandwidth; power, flops/W Performance analysis: Operational intensity, arithmetic intensity, Roofline –Understand basics of GPU architectures Hierarchical –Computational: PCI board -> chips -> SMs -> cores -> threads –Memories: host -> device -> shared -> registers Hardware multi-threading, SIMT model 5

Summary Many-core Programming techniques –Vectorization –DMA and overlapping communication and computation –Coalescing –How to exploit fast local memories LS on Cell, shared memory on GPUs –Atomic instructions Software telescopes –Correlator –Tiling –How to compare implementations on different hardware 6