Introduction CS 524 – High-Performance Computing.

Slides:

Advertisements

Similar presentations

COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.

Advertisements

Introductions to Parallel Programming Using OpenMP

Computer Abstractions and Technology

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

Parallel computer architecture classification

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

Performance Analysis of Multiprocessor Architectures

Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.

Types of Parallel Computers

Parallel System Performance CS 524 – High-Performance Computing.

Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.

Data Locality CS 524 – High-Performance Computing.

Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.

Instruction Level Parallelism (ILP) Colin Stevens.

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.

Parallel Computing Overview CS 524 – High-Performance Computing.

Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.

CS 524 – High- Performance Computing Outline. CS High-Performance Computing (Wi 2003/2004) - Asim LUMS2 Description (1) Introduction to.

Data Locality CS 524 – High-Performance Computing.

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.

Parallel System Performance CS 524 – High-Performance Computing.

Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.

Compilation, Architectural Support, and Evaluation of SIMD Graphics Pipeline Programs on a General-Purpose CPU Mauricio Breternitz Jr, Herbert Hum, Sanjeev.

Computer Systems Computer Performance.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Computer Performance Computer Engineering Department.

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

Computer Architecture and Organization Introduction.

Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.

SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.

From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.

Parallel Computing.

CPS 4150 Computer Organization Fall 2006 Ching-Song Don Wei.

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.

4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,

CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.

Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.

Parallel Computing Presented by Justin Reschke

CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.

CS203 – Advanced Computer Architecture Performance Evaluation.

Measuring Performance II and Logic Design

CS203 – Advanced Computer Architecture

Lecture 2: Performance Evaluation

CS161 – Design and Architecture of Computer Systems

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Defining Performance Which airplane has the best performance?

Parallel computer architecture classification

Morgan Kaufmann Publishers

What is Parallel and Distributed computing?

Computer Structure S.Abinash 11/29/ _02.

Performance of computer systems

Performance of computer systems

Performance of computer systems

Types of Parallel Computers

Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.

Presentation transcript:

Introduction CS 524 – High-Performance Computing

CS HPC (Au )- Asim LUMS2 Computing Performance Program execution time  Elapsed time from program initiation and input to program output  Depends on hardware, machine instruction set, and compiler  SPEC (Standard Performance Evaluation Corporation) benchmarks, LINPACK, LAPACK, Winstone, gaming suites, etc Processor time, T  Time processor is busy executing instructions  Depends on processing system (CPU and memory) and compiler  Metrics used include xFLOPS (x floating point operations per second), xIPS (x instructions per second); e.g. GFLOPS, MIPS

CS HPC (Au )- Asim LUMS3 High Performance Computing What performance is considered high?  There is no standard definition. Performance has increased with time. Yesterday’s supercomputing performance is now considered normal desktop performance. Cost Performance Today 20 years ago Cost – performance  High performance achieved at significant higher cost  Performance gains at the higher end are expensive

CS HPC (Au )- Asim LUMS4 Why High Performance Computing? Need  Computationally intensive problems: simulation of complex phenomena, computational fluid mechanics, finite element analysis  Large scale problems: weather forecasting, environmental impact simulation, DNA sequencing, data mining High performance computing (HPC) is driven by need. Historically, applied scientists and engineers have fueled research and development in HPC systems (hardware, software, compilers, languages, networking, etc).  Reducing execution and processor time, T  Reducing T crucial for HPC

CS HPC (Au )- Asim LUMS5 Basic Performance Equation Processor time, T T = (N × S)/R N = No. of instructions executed by processor S = Average no. of basic steps per instruction R = Clock rate How can performance be increased?  Buy a higher performance computer? Wrong!  Decrease N and S, increase R; how? Buy a higher performance computer? Yes, but…

CS HPC (Au )- Asim LUMS6 Enhancing Performance Increase concurrency and parallelism  Operation level or implicit parallelism: pipelining and superscalar operation  Task or explicit parallelism: Loops, threads, multi-processor execution Improve utilization of hardware features  Caches  Pipelines  Communication links (e.g. bus) Develop algorithms and software that take advantage of hardware and compiler features  Programming languages and compilers  Software libraries  Algorithm design

CS HPC (Au )- Asim LUMS7 Programming Models/Techniques Single processor performance issues  Data locality (cache and register tuning)  Data dependency (pipelining and superscalar ops)  Fine-grained parallelism (pipelining, threads, etc) Parallel programming models  Distributed memory or message passing programming MPI (message passing interface)  Shared memory or global address space programming POSIX threads (Pthreads) OpenMP

CS HPC (Au )- Asim LUMS8 Operation Level Parallelism for (i = 0; i < nrepeats; i++) w = *w ; /* O2 */ for (i = 0; i < nrepeats; i++) { w = *w ; /* O4 */ x = *x ; } for (i = 0; i < nrepeats; i++) { w = *w ; /* O8 */ x = *x ; y = *y ; z = *z ; Performance in MFLOPSO2O4O8 P4 1.6 GHz P3 800 MHz suraj (UltraSparc II-250)355064

CS HPC (Au )- Asim LUMS9 Cache Effect on Performance (1) Simple matrix transpose int n; float a[n][n], b[n][n]; for (i = 0; i < n; i++) { for (j = 0; j < n; j++) a[i][j] = b[j][i]; } Machine32 x x 1000 P4 1.6 GHz4022 P3 800 MHz3623 suraj95 Performance in million moves per second

CS HPC (Au )- Asim LUMS10 Cache Effect on Performance (2) /* Dot-product form of matrix-vector multiply */ for (i = 0; i < n; i++) { for (j = 0; j < n; j++) y[i] = y[i] + a[i][j]*x[j]; } /* SAXPY form of matrix-vector multiply */ for (j = 0; j < n; j++) { for (i = 0; i < n; i++) y[i] = y[i] + a[i][j]*x[j];} Matrix-vector multiplication MachineDot-productSAXPY P4 1.6 GHz P3 800 MHz suraj Performance in million moves per second

CS HPC (Au )- Asim LUMS11 Lessons The performance of a sinple program can be a complex function of hardware and compiler Slight changes in the hardware or program can change performance significantly Since we want to write higher performing programs we must take into account the hardware and compiler in our programs even on single processor machines Since the actual performance is so complicated we need simple models to help us design efficient algorithms

CS HPC (Au )- Asim LUMS12 Summary In theory, compilers can optimize our code for the architecture. In reality, compilers are still not that smart. Typical HPC applications are numerically intensive with major time spend in loops. To achieve higher performance, programmers need to understand and utilize hardware and software features to extract added performance.