Alistair Rendell and Josh Milthorpe Research School of Computer Science Australian National University.

Alistair Rendell and Josh Milthorpe Research School of Computer Science Australian National University

The idea The idea – Split your program into bits that can be executed simultaneously Motivation Motivation – … at a cost effective price – Speed, Speed, Speed… at a cost effective price – If we didn’t want it to go faster we would not be bothered with the hassles of parallel programming! Reduce the time to solution to acceptable levels Reduce the time to solution to acceptable levels – No point waiting 1 week for tomorrow’s weather forecast – Simulations that take months to run are not useful in a design environment

Fluid flow problems Fluid flow problems – Weather forecasting/climate modeling – Aerodynamic modeling of cars, planes, rockets etc Structural Mechanics Structural Mechanics – Building bridge, car, etc strength analysis – Car crash simulation Speech and character recognition, image processing Speech and character recognition, image processing Visualization, virtual reality Visualization, virtual reality Semiconductor design, simulation of new chips Semiconductor design, simulation of new chips Structural biology, molecular level design of drugs Structural biology, molecular level design of drugs Human genome mapping Human genome mapping Financial market analysis and simulation Financial market analysis and simulation Datamining, machine learning Datamining, machine learning Games programming! Games programming!

Atmosphere divided into 3D regions or cells Atmosphere divided into 3D regions or cells Complex mathematical equations describe conditions in each cell, eg pressure, temperature, velocity Complex mathematical equations describe conditions in each cell, eg pressure, temperature, velocity – Conditions change according to neighbour cells – Updates repeated frequently as time passes – Cells are affected by more distant cells the longer range the forecast Assume Assume – Cells are 1x1x1 mile to a height of 10 miles, 5x10 8 cells – 200 flops to update each cell per timestep – 10 minute timesteps for total of 10 days 100 days on 100 mflop machine 100 days on 100 mflop machine 10 minutes on a tflop machine 10 minutes on a tflop machine

NCI: National Computational Infrastructure NCI: National Computational Infrastructure – http://nci.org.au and http://nf.nci.org.au http://nci.org.auhttp://nf.nci.org.au http://nci.org.auhttp://nf.nci.org.au History History – Establishment of APAC in 1998 with $19.5M grant from federal government, renewed in 2004 with a grant of about $29M – Changed to NCI in 2007 with funding through NCRIS and Super Science Programs – 2010 machine is Sun X6275 Constellation Cluster, 1492 nodes (2*2.93GHz Nehalem) or 11936 cores. QDR InfiniBand interconnect – Installing new Fujitsu Primergy system with Sandy Bridge nodes and 57,000 cores, 160TB RAM, 10PB of disk

Bunyip: tsg.anu.edu.au/Projects/Bunyip Bunyip: tsg.anu.edu.au/Projects/Bunyip – 192 processor PC Cluster – winner of 2000 Gordon Bell prize for best price performance High Performance Computing Group High Performance Computing Group – Jabberwocky cluster – Sunnyvale cluster – Single Chip Cloud Computer

YearHardwareLanguages 1950Early DesignsFortran I (Backus, 57) 1960Integrated circuitsFortran 66 1970Large scale integrationC (72) 1980RISC and PCC++ (83), Python 1.0 (89) 1990Shared and distributed parallelMPI, OpenMP, Java (95) 2000Faster, better, hotterPython 2.0 (00) 2010Throughput orientedCUDA, OpenCL Parallelism became an issue for programmers from late 80s People began compiling lists of big parallel systems

All have multiple processors (many have GPUs) NoComputerSite CoresRmaxRpeakPower 1 Cray XK7, Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x 2012, DoE, USA 56064017590000271125508209 2 BlueGene/Q, Power BQC 16C 1.60 GHz, Custom 2011, DoE, USA 157286416324751201326597890 3 K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect 2011, RIKEN, Japan 705024105100001128038412660 4 BlueGene/Q, Power BQC 16C 1.60GHz, Custom 2012, DoE, USA 7864328162376100663303945 5 BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect 2012, MaxPlanck, Germany 393216414118050331651970 24 Fujitsu PRIMERGY CX250 S1, Xeon E5-2670 8C 2.600GHz, Infiniband FDR 2012, NCI, Australia 535049786001112883

Moore’s Law ‘Transistor density will double approximately every two years.’ Dennard Scaling ‘As MOSFET features shrink, switching time and power consumption will fall proportionately’ which led to higher Hertz and faster flops

Agarwal, Hrishikesh, Keckler Burger, Clock Rate Versus IPC, ISCA 2000 250nm, 400mm 2, 100% 180nm, 450mm 2, 100% 130nm, 566mm 2, 82% 100nm, 622mm 2, 40% 70nm, 713mm 2, 19% 50nm, 817mm 2, 6.5% 35nm, 937mm 2, 1.9% Until the chips became too big…

…so multiple cores appeared on chip …until we hit a bigger problem… 2004 Sun releases Sparc IV with dual cores and heralding the start of multicore

…the end of Dennard scaling… Dennard, Gaensslen, Yu, Rideout, Bassous and Leblanc, IEEE SSC, 1974 Dennard scaling ‘As MOSFET features shrink, switching time and power consumption will fall proportionately.’ Moore’s Law ‘Transistor density will double approximately every two years.’ ✗ ✗ ✓ ✓ …ushering in..

1960-20102010-? Few transistorsNo shortage of transistors No shortage of powerLimited power Maximize transistor utilityMinimize energy GeneralizeCustomize …and a fundamentally new set of building blocks for our petascale systems

LevelCharacteristicChallenge/Opportunity As a wholeSheer number of node Fujitsu K machine has 548,352 cores Programming language/environment Fault tolerance Within a domain Heterogeneity Titan system uses CPUs and GPUs What to use when Co-location of data with unit processing it On the chipEnergy minimization Already processors have frequency and voltage scaling Minimize data size and movement including use of just enough precision Specialized cores In RSCS we are working in all these areas

Multiple instruction units: Multiple instruction units: – Typical processors issue ~4 instructions per cycle Instruction Pipelining: Instruction Pipelining: – Complicated operations are broken into simple operations that can be overlapped Graphics Engines: Graphics Engines: – Use multiple rendering pipes and processing elments to render millions of polygons a second Interleaved Memory: Interleaved Memory: – Multiple paths to memory that can be used at same time Input/Output: Input/Output: – Disks are striped with different blocks of data written to different disks at the same time

Split program up and run parts simultaneously on different processors Split program up and run parts simultaneously on different processors – On N computers the time to solution should (ideally!) be 1/N – : the art of writing the parallel code! – Parallel Programming: the art of writing the parallel code! – : the hardware on which we run our parallel code! – Parallel Computer: the hardware on which we run our parallel code! COMP4300 will discuss both Beyond raw compute power other motivations include Beyond raw compute power other motivations include – Enabling more accurate simulations in the same time (finer grids) – Providing access to huge aggregate memories – Providing more and/or better input/output capacity

Course is run every other year Course is run every other year – Drop out this year and it won’t be repeated until 2015 It’s a 4000/6000 level course, it’s supposed to: It’s a 4000/6000 level course, it’s supposed to: – Be more challenging that a 3000 level course! – Be less well structured – Have a greater expectation on you – Have more student participation – Be fun!

Parallel Architecture: Parallel Architecture: – Basic issues concerning design and likely performance of parallel systems Specific Systems: Specific Systems: – Will make extensive use of NCI facilities Programming Paradigms: Programming Paradigms: – Distributed and shared memory, things in between, data intensive computing Parallel Algorithms: Parallel Algorithms: – Numeric and non-numeric The Future The Future

The pieces The pieces – 2 lectures per week (~30 core lecture hours) – 6 Labs (not marked, solutions provided) – 2 assignments (40%) – 1 mid-semester exam (~2 hours, 20%) – 1 final exam (3 hours, 40%) Final mark is sum of assignment, mid- semester and final exam mark Final mark is sum of assignment, mid- semester and final exam mark

Two slots Two slots – Tue 14:00-16:00 Chem T2 – Thu15:00-16:00 Chem T2 Exact schedule on web site Exact schedule on web site Partial notes will be posted on the web site Partial notes will be posted on the web site – bring copy to lecture Attendance at lectures and labs is strongly recommended Attendance at lectures and labs is strongly recommended – attendance at labs will be recorded

http://cs.anu.edu.au/student/comp4300 We will use wattle only for lecture recordings

Start in week 3 (March 4 th ) Start in week 3 (March 4 th ) – See web page for detailed schedule 2 sessions available 2 sessions available – Tue12:00-14:00N113 – Thu10:00-12:00N112 Register via streams now Register via streams now Not assessed, but will be examined Not assessed, but will be examined

Course Convener Alistair Rendell N226 CSIT Building Alistair.Rendell@anu.edu.au Phone 6125 4386 Lecturer Josh Milthorpe N216 CSIT Building Josh.Milthorpe@anu.edu.au Phone 6125 4478

Course web page Course web pagecs.anu.edu.au/student/comp4300 Bulletin board (forum – available from streams) Bulletin board (forum – available from streams)cs.anu.edu.au/streams At lectures and in labs At lectures and in labs Email Email comp4300@cs.anu.edu.au In person In person – Office hours (to be set – see web page) – Email for appointment if you want specific time

Principles of Parallel Programming, Calvin Lin and Lawrence Snyder, Pearson International Edition, ISBN 978-0-321-54942-6 Introduction to Parallel Computing, 2nd Ed., Grama, Gupta, Karypis, Kumar, Addison-Wesley, ISBN 0201648652 (Electronic version accessible on line from ANU library – search for title) Parallel Programming: techniques and applications using networked workstations and parallel computers, Barry Wilkinson and Michael Allen. Prentice Hall 2nd edition. ISBN 0131405632. and others on web page

Alistair Rendell and Josh Milthorpe Research School of Computer Science Australian National University.

Similar presentations

Presentation on theme: "Alistair Rendell and Josh Milthorpe Research School of Computer Science Australian National University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Alistair Rendell and Josh Milthorpe Research School of Computer Science Australian National University.

Similar presentations

Presentation on theme: "Alistair Rendell and Josh Milthorpe Research School of Computer Science Australian National University."— Presentation transcript:

Similar presentations

About project

Feedback