COMP7330/7336 Advanced Parallel and Distributed Computing Dr. Xiao Qin Auburn University

COMP7330/7336 Advanced Parallel and Distributed Computing Dr. Xiao Qin Auburn University http://www.eng.auburn.edu/~xqin xqin@auburn.edu

Your Background Not-A-Quiz Have you taken the parallel and distributed computing (PDC) class? What lab assignments have you completed in the PDC class? What is your on-going dissertation or thesis research project? Is your current research project related to parallel and distributed computing?

Today’s Goal: Course Objectives Course Content & Grading Answer your questions about COMP 7330/7336 Provide you a sense of the trends that shape the parallel and distributed computing field

COMP 7330/7336: Semester Calendar http://www.eng.auburn.edu/~xqin/courses/comp7330 See the class webpage for the most up to date version!

Will it be worthwhile?

Textbook Ananth Grama, Anshul Gupta, George Karypis, Vinpin Kumar, “Introduction to Parallel Computing”, Second Edition, Pearson, ISBN 0-201-64865-2

Topic Coverage Handouts, book chapters, and papers will be used as supplement course material. The course material will be posted online. Covers (These topics may change) – Advances of technologies and system architectures for parallel and distributed computing – Parallel computing algorithms – Parallel programming models – Message Passing Interface – Convergence of parallel, distributed and cloud computing – Analytical modeling and system evaluation – Cloud Computing and Big Data

Course Syllabus Prerequisite: – COMP 4300, Computer Architecture – COMP 4320, Introduction to Computer Networks Midterm exam and Final exam Grading – Class Participation10% – Midterm 20% – Final 20% – Homework Assignment 20% – Research Projects 30%

Course Syllabus (cont.) Scale – Letter grades will be awarded based on the following scale. This scale may be adjusted upwards if it is necessary based on the final grades. – A  90 B  80 C  70 D  60 F < 60

Office Hours and Exams Mid-term W 10/9/2013 Office hours: MWF 10:55am-11:55am

Am I going to read papers to you? NO! Papers provides a framework and complete background, so lectures can be more interactive. – You do the reading – We’ll discuss it Projects will go “beyond”

Questions Please ask at any time!

1969: CDC 66001 st system for scientific computing 1975: CDC 76001 st supercomputer 1985: Cray X-MP / 4 81 st vector supercomputer 1989: Cray Y-MP / 4 64 1993: Cray C-90 / 2 128 1994: Cray T3D 641 st parallel supercomputer 1995: Cray T3D 128 1998: Cray T3E 256 1 st MPP supercomputer 2002: IBM SP4 512 1 Teraflops 2005: IBM SP5 512 2006: IBM BCX10 Teraflops 2009: IBM SP6100 Teraflops 2012: IBM BG/Q> 1 Petaflops The History Source: scc.acad.bg/ncsa/documentation/Presentations/GiErbacci_slides.ppt‎

HPC Infrastructure for Scientific computing Logical NameSP6 (Sep 2009)BGP (jan 2010)PLX (2011) ModelIBM P575IBM BG / PIBM IDATAPLEX Architecture SMPMPPLinux Cluster ProcessorIBM Power 6 4.7 GhzIBM PowerPC 0,85 GHzIntel Westmere Ec 2.4 Ghz # of core53764096 3288 + 548 GPGPU Nvidia Fermi M2070 # of node16832274 # of rack12110 Total RAM20 Tera Byte2 Tera Byte~ 13 Tera Byte Interconnection Qlogic Infiniband DDR 4xIBM 3D TorusQlogiq QDR 4x Operating System AIXSuseRedHat Total Power~ 800 Kwatts~ 80 Kwatts~ 200 Kwatts Peak Performance > 101 Tera Flops~ 14 Tera Flops~ 300 Tera Flops Which machine is the most energy efficient?

SP Power6 @ CINECA - 168 compute nodes IBM p575 Power6 (4.7GHz) - 5376 compute cores (32 core / node) - 128 Gbyte RAM / node (21Tbyte RAM) - IB x 4 DDR (double data rate) Peak performance 101 TFlops Peak performance 101 TFlops Rmax 76.41 Tflop/s Rmax 76.41 Tflop/s Efficiency (workload) 75.83 % Efficiency (workload) 75.83 % N. 116 Top500 (June 11) N. 116 Top500 (June 11) - 2 login nodes IBM p560 - 21 I/O + service nodes IBM p520 - 1.2 PByte Storage row: 500 Tbyte working area High Performance 700 Tbyte data repository 700 Tbyte data repository

BGP @ CINECA Model: IBM BlueGene / P Architecture: MPP Processor Type: IBM PowerPC 0,85 GHz Compute Nodes: 1024 (quad core, 4096 total) RAM: 4 GB/compute node (4096 GB total) Internal Network: IBM 3D Torus OS: Linux (login nodes) CNK (compute nodes) Peak Performance: 14.0 TFlop/s

PLX @ CINECA IBM Server dx360M3 – Compute node 2 x processori Intel Westmere 6c X5645 2.40GHz 12MB Cache, DDR3 1333MHz 80W 48GB RAM su 12 DIMM 4GB DDR3 1333MHz 1 x HDD 250GB SATA 1 x QDR Infiniband Card 40Gbs 2 x NVIDIA m2070 (m2070q su 10 nodi) Peak performance 32 TFlops (3288 cores a 2.40GHz) Peak performance 565 TFlops Single Precision o 283 TFlops Double Precision (548 Nvidia M2070) N. 54 Top500 (June 11)

Visualisation system Visualisation and computer graphycs Virtual Theater 6 video-projectors BARCO SIM5 Audio surround system Cylindric screen 9.4x2.7 m, angle 120° Ws + Nvidia cards RVN nodes on PLX system

Storage Infrastructure System Available bandwidth (GB/s)Space (TB) Connection Tecnology Disk Tecnology 2 x S2A95003,2140FCP 4Gb/sFC 4 x S2A95003,2140FCP 4Gb/sFC 6 x DCS99005,0540FCP 8Gb/sSATA 4 x DCS99005,0720FCP 4Gb/sSATA 3 x DCS99005,01500FCP 4Gb/sSATA Hitachi Ds3,2360FCP 4Gb/sSATA 3 x SFA100010,02200QDRSATA 1 x IBM51003,266FCP 8Gb/sFC > 5,6 PB

HPC Evolution Moore’s law is holding, in the number of transistors – Transistors on an ASIC still doubling every 18 months at constant cost – 15 years of exponential clock rate growth has ended Moore’s Law reinterpreted – Performance improvements are now coming from the increase in the number of cores on a processor (ASIC) – #cores per chip doubles every 18 months instead of clock – 64-512 threads per node will become visible soon From Herb Sutter

A supercomputer application and software are usually much more long-lived than a hardware - Hardware life typically four-five years at most. - Fortran and C are still the main programming models Programming is stuck -Arguably hasn’t changed so much since the 70’s Software is a major cost component of modern technologies. - The tradition in HPC system procurement is to assume that the software is free. Real HPC Crisis is with ____?

Complexity is rising dramatically Challenges for the applications on Petaflop systems The use of O(100K) cores implies dramatic optimization effort New paradigm as the support of a hundred threads in one node implies new parallelization strategies It’s time for a change

Improvement of existing codes will become complex and partly impossible Implementation of new parallel programming methods in existing large applications has not always a promising perspective There is the need for new community codes It’s time for a change (cont.)

Roadmap to Exascale (architectural trends)

COMP7330/7336 Advanced Parallel and Distributed Computing Dr. Xiao Qin Auburn University

Similar presentations

Presentation on theme: "COMP7330/7336 Advanced Parallel and Distributed Computing Dr. Xiao Qin Auburn University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

COMP7330/7336 Advanced Parallel and Distributed Computing Dr. Xiao Qin Auburn University

Similar presentations

Presentation on theme: "COMP7330/7336 Advanced Parallel and Distributed Computing Dr. Xiao Qin Auburn University"— Presentation transcript:

Similar presentations

About project

Feedback