Introductory Courses in High Performance Computing at Illinois David Padua.

Slides:



Advertisements
Similar presentations
Optimizing Compilers for Modern Architectures Syllabus Allen and Kennedy, Preface Optimizing Compilers for Modern Architectures.
Advertisements

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Scheduling and Performance Issues for Programming using OpenMP
Distributed Systems CS
Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.
EECC551 - Shaaban #1 Fall 2005 lec# Static Compiler Optimization Techniques We examined the following static ISA/compiler techniques aimed.
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Parallel Programming Motivation and terminology – from ACM/IEEE 2013 curricula.
Summary Background –Why do we need parallel processing? Applications Introduction in algorithms and applications –Methodology to develop efficient parallel.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
Introduction CS 524 – High-Performance Computing.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
CS 240A: Models of parallel programming: Machines, languages, and complexity measures.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.
Topic ? Course Overview. Guidelines Questions are rated by stars –One Star Question  Easy. Small definition, examples or generic formulas –Two Stars.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
L21: Final Preparation and Course Retrospective (also very brief introduction to Map Reduce) December 1, 2011.
Guiding Principles. Goals First we must agree on the goals. Several (non-exclusive) choices – Want every CS major to be educated in performance including.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
Early Adopter: Integration of Parallel Topics into the Undergraduate CS Curriculum at Calvin College Joel C. Adams Chair, Department of Computer Science.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
Pipelining and Parallelism Mark Staveley
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Lecture 19 Beyond Low-level Parallelism. 2 © Wen-mei Hwu and S. J. Patel, 2002 ECE 412, University of Illinois Outline Models for exploiting large grained.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
NSF/TCPP Curriculum Planning Workshop Joseph JaJa Institute for Advanced Computer Studies Department of Electrical and Computer Engineering University.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing Rahul.S. Sampath May 9 th 2007.
Memory-Aware Compilation Philip Sweany 10/20/2011.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
Computer Architecture Lecture 27: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015.
Elec/Comp 526 Spring 2015 High Performance Computer Architecture Instructor Peter Varman DH 2022 (Duncan Hall) rice.edux3990 Office Hours Tue/Thu.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Tools and Libraries for Manycore Computing Kathy Yelick U.C. Berkeley and LBNL.
Code Optimization.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Computer Architecture Principles Dr. Mike Frank
Guoliang Chen Parallel Computing Guoliang Chen
Summary Background Introduction in algorithms and applications
Shared Memory Programming
EE 4xx: Computer Architecture and Performance Programming
Parallel Programming in C with MPI and OpenMP
Introduction to Optimization
Presentation transcript:

Introductory Courses in High Performance Computing at Illinois David Padua

Our oldest course 420 Parallel Programming for Scientists and Engineers. Course intended for non-CS majors (but many CS students take it). Taught once a year for the last 20 years. CS 420 Parallel Progrmg: Sci & Engrg Credit: 3 or 4 hours. Fundamental issues in design and development of parallel programs for various types of parallel computers. Various programming models according to both machine type and application area. Cost models, debugging, and performance evaluation of parallel programs with actual application examples. Same as CSE 402 and ECE undergraduate hours. 3 or 4 graduate hours. Prerequisite: CS 400 or CS 225CSE 402ECE 492CS 225

420 Parallel Programming for Scientists and Engineers Machines Programming models – Shared-memory – Distributed memory – Data parallel OpenMP/MPI/Fortran 90 – Clusters/Shared-memory machines/Vector supercomputers (in the past) Data parallel numerical algorithms (in Fortran 90/MATLAB) Sorting/N-Body

Other courses 4xx Parallel programming. For majors 4xx Performance Programming. For all issues related to performance 5xx Theory of parallel computing. For advanced students 554 Parallel Numerical Algorithms.

4xx Parallel programming. For majors Overview of architectures. Architectural characterization of most important parallel systems today. Issues in effective programming of parallel architectures: exploitation of parallelism, locality (cache, registers), load balancing, communication, overhead, consistency, coherency, latency avoidance. Transactional memories. Programming paradigms. Shared-memory, message passing, data parallel or regular, and functional programming paradigms... Message-passing programming. PGAS programming. Survey of programming languages. OpenMP, MPI, TBB, Charm++, UPC, Co-array Fortran, High-Performance Fortran, NESL. Concepts. Basic concepts in parallel programming. Speedup, efficiency, redundancy, isoefficiency, Amdahl's law. Programming principles. Reactive parallel programming. Memory consistency. Synchronization strategies, critical regions, atomic updates, races, deadlock avoidance, prevention, livelock, starvation, scheduling fairness. Lock-free algorithms. Asynchronous algorithms. Speculation. Load balancing. Locality enhancement. Lock free algorithms. Asynchronous algorithms. Algorithms. Basic algorithms: Element-by-element array operations, reductions, parallel prefix, linear recurrences, boolean recurrences. Systolic arrays, Matrix multiplication, LU decomposition, Jacobi relaxation, fixed point iterations. Sorting and searching. Graph algorithms, Datamining algorithms. N-Body/Particle simulations.

4xx Performance Programming. Sequential Performance bottlenecks: CPU (pipelining, multiple issue processors (in-order and out-of order), support for speculation, branch prediction, execution units, vectorization, registers, register renaming); caches (temporal and spatial locality, compulsory misses, conflict misses, capacity misses, coherence misses); memory (latency, row/column, read/write), I/O… Parallel performance bottlenecks: Amdahl, load imbalance, communication, false sharing, granularity of communication (distributed memory) Optimization strategies: Algorithm and program optimizations. Static and dynamic optimizations. Data dependent optimizations. Machine dependent and machine independent optimizations. Sequential program optimizations: Redundancy elimination. Peephole optimizations. Loop optimizations. Branch optimizations. Locality optimizations. Tiling. Cache oblivious and cache conscious algorithms. Padding. Hardware and software prefetch. Parallel programming optimizations: Brief introduction to parallel programming of shared-memory machines. Dependence graphs and program optimizations. Privatization, expansion, induction variables, wrap-around variables, loop fusion and loop fission. Frequently occurring kernels (reductions, scan, linear recurrences) and their parallel versions. Program vectorization. Multimedia extensions and their programming. Speculative parallel programming. Load balancing. Bottlenecks. Overdecomposition. Communication optimizations. Aggregation for communication. Redundant computations to save avoid communication. False sharing. Optimization for power. Tools for program tuning. Performance monitors. Profiling. Sampling. Compiler switches, directives and compiler feedback. Autotuning. Empirical search. Machine learning strategies for program optimization. Libbrary generators. ATLAS, FFTW, SPIRAL. Algorithm choice and tuning. Hybrid algorithms. Self optimizing algorithms. Sorting. Datamining. Numerical error and algorithm choice.