Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP7330/7336 Advanced Parallel and Distributed Computing Dr. Xiao Qin Auburn University

Similar presentations


Presentation on theme: "COMP7330/7336 Advanced Parallel and Distributed Computing Dr. Xiao Qin Auburn University"— Presentation transcript:

1 COMP7330/7336 Advanced Parallel and Distributed Computing Dr. Xiao Qin Auburn University http://www.eng.auburn.edu/~xqin xqin@auburn.edu

2 Your Background Not-A-Quiz Have you taken the parallel and distributed computing (PDC) class? What lab assignments have you completed in the PDC class? What is your on-going dissertation or thesis research project? Is your current research project related to parallel and distributed computing?

3 Today’s Goal: Course Objectives Course Content & Grading Answer your questions about COMP 7330/7336 Provide you a sense of the trends that shape the parallel and distributed computing field

4 COMP 7330/7336: Semester Calendar http://www.eng.auburn.edu/~xqin/courses/comp7330 See the class webpage for the most up to date version!

5

6

7

8 Will it be worthwhile?

9

10

11

12

13 Textbook Ananth Grama, Anshul Gupta, George Karypis, Vinpin Kumar, “Introduction to Parallel Computing”, Second Edition, Pearson, ISBN 0-201-64865-2

14 Topic Coverage Handouts, book chapters, and papers will be used as supplement course material. The course material will be posted online. Covers (These topics may change) – Advances of technologies and system architectures for parallel and distributed computing – Parallel computing algorithms – Parallel programming models – Message Passing Interface – Convergence of parallel, distributed and cloud computing – Analytical modeling and system evaluation – Cloud Computing and Big Data

15 Course Syllabus Prerequisite: – COMP 4300, Computer Architecture – COMP 4320, Introduction to Computer Networks Midterm exam and Final exam Grading – Class Participation10% – Midterm 20% – Final 20% – Homework Assignment 20% – Research Projects 30%

16 Course Syllabus (cont.) Scale – Letter grades will be awarded based on the following scale. This scale may be adjusted upwards if it is necessary based on the final grades. – A  90 B  80 C  70 D  60 F < 60

17 Office Hours and Exams Mid-term W 10/9/2013 Office hours: MWF 10:55am-11:55am

18 Am I going to read papers to you? NO! Papers provides a framework and complete background, so lectures can be more interactive. – You do the reading – We’ll discuss it Projects will go “beyond”

19 Questions Please ask at any time!

20 1969: CDC 66001 st system for scientific computing 1975: CDC 76001 st supercomputer 1985: Cray X-MP / 4 81 st vector supercomputer 1989: Cray Y-MP / 4 64 1993: Cray C-90 / 2 128 1994: Cray T3D 641 st parallel supercomputer 1995: Cray T3D 128 1998: Cray T3E 256 1 st MPP supercomputer 2002: IBM SP4 512 1 Teraflops 2005: IBM SP5 512 2006: IBM BCX10 Teraflops 2009: IBM SP6100 Teraflops 2012: IBM BG/Q> 1 Petaflops The History Source: scc.acad.bg/ncsa/documentation/Presentations/GiErbacci_slides.ppt‎

21 HPC Infrastructure for Scientific computing Logical NameSP6 (Sep 2009)BGP (jan 2010)PLX (2011) ModelIBM P575IBM BG / PIBM IDATAPLEX Architecture SMPMPPLinux Cluster ProcessorIBM Power 6 4.7 GhzIBM PowerPC 0,85 GHzIntel Westmere Ec 2.4 Ghz # of core53764096 3288 + 548 GPGPU Nvidia Fermi M2070 # of node16832274 # of rack12110 Total RAM20 Tera Byte2 Tera Byte~ 13 Tera Byte Interconnection Qlogic Infiniband DDR 4xIBM 3D TorusQlogiq QDR 4x Operating System AIXSuseRedHat Total Power~ 800 Kwatts~ 80 Kwatts~ 200 Kwatts Peak Performance > 101 Tera Flops~ 14 Tera Flops~ 300 Tera Flops Which machine is the most energy efficient?

22 SP Power6 @ CINECA - 168 compute nodes IBM p575 Power6 (4.7GHz) - 5376 compute cores (32 core / node) - 128 Gbyte RAM / node (21Tbyte RAM) - IB x 4 DDR (double data rate) Peak performance 101 TFlops Peak performance 101 TFlops Rmax 76.41 Tflop/s Rmax 76.41 Tflop/s Efficiency (workload) 75.83 % Efficiency (workload) 75.83 % N. 116 Top500 (June 11) N. 116 Top500 (June 11) - 2 login nodes IBM p560 - 21 I/O + service nodes IBM p520 - 1.2 PByte Storage row: 500 Tbyte working area High Performance 700 Tbyte data repository 700 Tbyte data repository

23 BGP @ CINECA Model: IBM BlueGene / P Architecture: MPP Processor Type: IBM PowerPC 0,85 GHz Compute Nodes: 1024 (quad core, 4096 total) RAM: 4 GB/compute node (4096 GB total) Internal Network: IBM 3D Torus OS: Linux (login nodes) CNK (compute nodes) Peak Performance: 14.0 TFlop/s

24 PLX @ CINECA IBM Server dx360M3 – Compute node 2 x processori Intel Westmere 6c X5645 2.40GHz 12MB Cache, DDR3 1333MHz 80W 48GB RAM su 12 DIMM 4GB DDR3 1333MHz 1 x HDD 250GB SATA 1 x QDR Infiniband Card 40Gbs 2 x NVIDIA m2070 (m2070q su 10 nodi) Peak performance 32 TFlops (3288 cores a 2.40GHz) Peak performance 565 TFlops Single Precision o 283 TFlops Double Precision (548 Nvidia M2070) N. 54 Top500 (June 11)

25 Visualisation system Visualisation and computer graphycs Virtual Theater 6 video-projectors BARCO SIM5 Audio surround system Cylindric screen 9.4x2.7 m, angle 120° Ws + Nvidia cards RVN nodes on PLX system

26 Storage Infrastructure System Available bandwidth (GB/s)Space (TB) Connection Tecnology Disk Tecnology 2 x S2A95003,2140FCP 4Gb/sFC 4 x S2A95003,2140FCP 4Gb/sFC 6 x DCS99005,0540FCP 8Gb/sSATA 4 x DCS99005,0720FCP 4Gb/sSATA 3 x DCS99005,01500FCP 4Gb/sSATA Hitachi Ds3,2360FCP 4Gb/sSATA 3 x SFA100010,02200QDRSATA 1 x IBM51003,266FCP 8Gb/sFC > 5,6 PB

27

28 HPC Evolution Moore’s law is holding, in the number of transistors – Transistors on an ASIC still doubling every 18 months at constant cost – 15 years of exponential clock rate growth has ended Moore’s Law reinterpreted – Performance improvements are now coming from the increase in the number of cores on a processor (ASIC) – #cores per chip doubles every 18 months instead of clock – 64-512 threads per node will become visible soon From Herb Sutter

29 A supercomputer application and software are usually much more long-lived than a hardware - Hardware life typically four-five years at most. - Fortran and C are still the main programming models Programming is stuck -Arguably hasn’t changed so much since the 70’s Software is a major cost component of modern technologies. - The tradition in HPC system procurement is to assume that the software is free. Real HPC Crisis is with ____?

30 Complexity is rising dramatically Challenges for the applications on Petaflop systems The use of O(100K) cores implies dramatic optimization effort New paradigm as the support of a hundred threads in one node implies new parallelization strategies It’s time for a change

31 Improvement of existing codes will become complex and partly impossible Implementation of new parallel programming methods in existing large applications has not always a promising perspective There is the need for new community codes It’s time for a change (cont.)

32 Roadmap to Exascale (architectural trends)


Download ppt "COMP7330/7336 Advanced Parallel and Distributed Computing Dr. Xiao Qin Auburn University"

Similar presentations


Ads by Google