High Performance Computing Course Notes Course Administration
2 Computer Science, University of Warwick Course Administration Course organiser: Dr. Ligang He Contact details Office hours: Monday: 2pm-3pm Wednesday: 2pm-3pm
3 Computer Science, University of Warwick Course Administration Course Format 15 CATs 30 hours Assessment: 70% examined, 30% coursework Coursework details announced in week 5 Coursework deadline in week 10
4 Computer Science, University of Warwick Learning Objectives By the end of the course, you should understand: The role of HPC in science and engineering Commonly used HPC platforms and parallel programming models The means by which to measure, analyse and assess the performance of HPC applications and their supporting hardware Mechanisms for evaluating the suitability of different HPC solutions to common problems in computational science The role of administration, scheduling, code portability and data management in an HPC environment, with particular reference to grid computing The potential benefits and pitfalls of Grid Computing
5 Computer Science, University of Warwick Materials The slides will be made available on-line after each lecture Relevant papers and on-line resources will be made available on-line throughout the course. Download and read suggested papers. Questions in the exam will be based on the content of the papers as well as details from the notes.
6 Computer Science, University of Warwick Coursework Coursework will involve the development of a parallel and distributed application using the Message Passing Interface (MPI). It will involve performance analysis and modelling. Your program will be assessed by a written report and a demo session. Please attend the C and MPI tutorials! Week 1-4, 9am-10am every Tuesday: introducing the C language in CS101 Week 5-6, 9am-10am every Tuesday: introducing how to write simple MPI programs in the MSc lab
High Performance Computing Course Notes HPC Fundamentals
8 Computer Science, University of Warwick Introduction What is High Performance Computing (HPC)? Difficult to define - its a moving target. Later 1980s, a supercomputer performs 100m FLOPS Today, a 2G Hz desktop/laptop performs a few giga FLOPS Today, a supercomputer performs tens of Tera FLOPS (Top500) High performance: O(1000) more powerful than the latest desktops Driven by demand of computation-intensive applications from various areas Medical and Biology (e.g. simulation of brains) Finance (e.g. modelling the world economy) Military and Defence (e.g. modelling explosion of nuclear weapons) Engineering (e.g. simulations of a car crash or a new airplane design)
9 Computer Science, University of Warwick An Example of Demands in Computing Capability Project: Blue Brain aim: construct a simulated brain Building blocks of a brain are neurocortical columns A column consists of about 60,000 neurons Human brain contains millions of such columns First stage: simulate a single column (each processor acting as one or two neurons) Then: simulate a small network of columns Ultimate goal: simulate the whole human brain IBM contributes Blue Gene supercomputer
10 Computer Science, University of Warwick Related Technologies HPC is an all-encompassing term for related technologies that continually push computing boundaries: computer architecture CPU, memory, VLSI Compilers Identify inefficient implementations Make use of the characteristics of the computer architecture Choose suitable compiler for a certain architecture Algorithms (for parallel and distributed systems) How to program on parallel and distributed systems Middleware From Grid computing technology Application->middleware->operating system Resource discovery and sharing
11 Computer Science, University of Warwick History of High Performance Computing 1960s: Scalar processor Process one data item at a time 1970s: Vector processor Can process an array of data items at one go Architecture Overhead Later 1980s: Massively Parallel Processing (MPP) Up to thousands of processors, each with its own memory and OS Break down a problem Later 1990s: Cluster Not a new term itself, but renewed interests Connecting stand-alone computers with high-speed network Later 1990s: Grid Tackle collaboration Draw an analogue from Power grid
12 Computer Science, University of Warwick Two Types of HPC Parallel Computing Breaking the problem to be computed into parts that can be run simultaneously in different processors Distributed Computing Parts of the work to be computed are computed in different places (Note: does not necessarily imply simultaneous processing) An example: C/S model Solve loosely-coupled problems (no much communication)
13 Computer Science, University of Warwick Parallel Computing Architectures of Parallel Computing SMP (Symmetric Multi-Processing) Multiple CPUs, single memory, shared I/O All resources in a SMP machine are equally available to each CPU Does not scale well to a large number of processors (less than 8) NUMA (Non-Uniform Memory Access) Multiple CPUs Each CPU has fast access to its local area of the memory, but slower access to other areas Scale well to a large number of processors Complicated memory access pattern MPP (Massively Parallel Processing) Cluster
14 Computer Science, University of Warwick Goals of HPC Minimise turn-around time to complete specific application problems (strong scaling) Maximise the problem size that can be solved given a set amount of time (weak scaling) Identify compromise between performance and cost. Most supercomputers are obsolete in terms of performance before the end of their physical life.
15 Computer Science, University of Warwick Maximising Performance How is performance maximised? Reduce the time per instruction (cycle time) : clock rate. In crease the number of instructions executed per-cycle : pipelining. Allow multiple processors to work on different parts of the same program at the same time : parallel execution. When performance is gained from  and  There is a limit to how quick processors will operate. Speed of light and electricity. Heat dissipation. Power consumption A instruction processing procedure cannot be divided into infinite stages When performance improvements come from  Overhead of communications