Presentation is loading. Please wait.

Presentation is loading. Please wait.

09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

Similar presentations


Presentation on theme: "09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008."— Presentation transcript:

1 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008

2 09/22/2008CS49602 Outline Introduction The fastest computer in the world today Large-scale scientific simulations Why writing fast parallel programs is hard New parallel programming languages Material for this lecture provided by: Kathy Yelick and Jim Demmel, UC Berkeley Brad Chamberlain, Cray

3 09/22/2008CS49603 Parallel and Distributed Computing Limited to supercomputers? -No! Everywhere! Scientific applications? -These are still important, but also many new commercial applications and new consumer applications are going to emerge. Programming tools adequate and established? -No! Many new research challenges My Research Area

4 09/22/2008CS49604 Why is This Course Important? Why now? We are seeing a convergence of high-end, conventional and embedded computing -On-chip architectures look like parallel computers -Languages, software development and compilation strategies originally developed for high end (supercomputers) are now becoming important for many other domains Why? -Technology trends Looking to the future -Parallel computing for the masses demands better parallel programming paradigms -And more people who are trained in writing parallel programs (you!) -How to put all these vast machine resources to the best use!

5 09/22/2008CS49605 The fastest computer in the world today What is its name? Where is it located? How many processors does it have? What kind of processors? How fast is it? RoadRunner Los Alamos National Laboratory 18,802 processor chips (~123,284 “processors”) AMD Opterons and IBM Cell/BE (in Playstations) 1.026 Petaflop/second One quadrilion operations/s 1 x 10 16 See http://www.top500.org

6 09/22/2008CS49606 Scientific Simulation: The Third Pillar of Science Traditional scientific and engineering paradigm: 1)Do theory or paper design. 2)Perform experiments or build system. Limitations: -Too difficult -- build large wind tunnels. -Too expensive -- build a throw-away passenger jet. -Too slow -- wait for climate or galactic evolution. -Too dangerous -- weapons, drug design, climate experimentation. Computational science paradigm: 3)Use high performance computer systems to simulate the phenomenon -Base on known physical laws and efficient numerical methods.

7 09/22/2008CS49607 Some Particularly Challenging Computations Science -Global climate modeling -Biology: genomics; protein folding; drug design -Astrophysical modeling -Computational Chemistry -Computational Material Sciences and Nanosciences Engineering -Semiconductor design -Earthquake and structural modeling -Computation fluid dynamics (airplane design) -Combustion (engine design) -Crash simulation Business -Financial and economic modeling -Transaction processing, web services and search engines Defense -Nuclear weapons -- test by simulations -Cryptography

8 09/22/2008CS49608 Example: Global Climate Modeling Problem Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: -Discretize the domain, e.g., a measurement point every 10 km -Devise an algorithm to predict weather at time t+  t given t Uses: -Predict major events, e.g., El Nino -Use in setting air emissions standards Source: http://www.epm.ornl.gov/chammp/chammp.html

9 High Resolution Climate Modeling on NERSC-3 – P. Duffy, et al., LLNL

10 09/22/2008CS496010 Some Characteristics of Scientific Simulation Discretize physical or conceptual space into a grid -Simpler if regular, may be more representative if adaptive Perform local computations on grid -Given yesterday’s temperature and weather pattern, what is today’s expected temperature? Communicate partial results between grids -Contribute local weather result to understand global weather pattern. Repeat for a set of time steps Possibly perform other calculations with results -Given weather model, what area should evacuate for a hurricane?

11 09/22/2008CS496011 More Examples: Parallel Computing in Data Analysis Finding information amidst large quantities of data General themes of sifting through large, unstructured data sets: -Has there been an outbreak of some medical condition in a community? -Which doctors are most likely involved in fraudulent charging to medicare? -When should white socks go on sale? -What advertisements should be sent to you? Data collected and stored at enormous speeds (Gbyte/hour) -remote sensor on a satellite -telescope scanning the skies -microarrays generating gene expression data -scientific simulations generating terabytes of data -NSA analysis of telecommunications

12 09/22/2008CS496012 Why writing (fast) parallel programs is hard

13 09/22/2008CS496013 Parallel Programming Complexity An Analogy to Preparing Thanksgiving Dinner Enough parallelism? (Amdahl’s Law) -Suppose you want to just serve turkey Granularity -How frequently must each assistant report to the chef -After each stroke of a knife? Each step of a recipe? Each dish completed? Locality -Grab the spices one at a time? Or collect ones that are needed prior to starting a dish? Load balance -Each assistant gets a dish? Preparing stuffing vs. cooking green beans? Coordination and Synchronization -Person chopping onions for stuffing can also supply green beans -Start pie after turkey is out of the oven All of these things makes parallel programming even harder than sequential programming.

14 09/22/2008CS496014 Finding Enough Parallelism Suppose only part of an application seems parallel Amdahl’s law -let s be the fraction of work done sequentially, so (1-s) is fraction parallelizable -P = number of processors Speedup(P) = Time(1)/Time(P) <= 1/(s + (1-s)/P) <= 1/s Even if the parallel part speeds up perfectly performance is limited by the sequential part

15 09/22/2008CS496015 Overhead of Parallelism Given enough parallel work, this is the biggest barrier to getting desired speedup Parallelism overheads include: -cost of starting a thread or process -cost of communicating shared data -cost of synchronizing -extra (redundant) computation Each of these can be in the range of milliseconds (=millions of flops) on some systems Tradeoff: Algorithm needs sufficiently large units of work to run fast in parallel (I.e. large granularity), but not so large that there is not enough parallel work

16 09/22/2008CS4960 Locality and Parallelism Large memories are slow, fast memories are small Storage hierarchies are large and fast on average Parallel processors, collectively, have large, fast cache -the slow accesses to “remote” data we call “communication” Algorithm should do most work on local data Proc Cache L2 Cache L3 Cache Memory Conventional Storage Hierarchy Proc Cache L2 Cache L3 Cache Memory Proc Cache L2 Cache L3 Cache Memory potential interconnects

17 09/22/2008CS496017 Load Imbalance Load imbalance is the time that some processors in the system are idle due to -insufficient parallelism (during that phase) -unequal size tasks Examples of the latter -adapting to “interesting parts of a domain” -tree-structured computations -fundamentally unstructured problems Algorithm needs to balance load

18 09/22/2008CS496018 New Parallel Programming Languages for Scientific Computing

19 09/22/2008CS496019 A Brief Look at Chapel (Cray) History: -Starting in 2002, the Defense Advanced Research Projects Agency (DARPA) funded 5 industry players to investigate the new revolutionary, commercially-viable high-end computer system of 2010 -Now, two teams (IBM and Cray) are still building systems to be deployed next year -Both have introduced new languages, along with Sun -Chapel (Cray) -X10 (IBM) -Fortress (Sun) We will look at Chapel for the next few slides

20 09/22/2008CS496020 Summary of Lecture Scientific simulation discretizes some space into a grid -Perform local computations on grid -Communicate partial results between grids -Repeat for a set of time steps -Possibly perform other calculations with results Writing fast parallel programs is difficult -Amdahl’s Law  Must parallelize most of computation -Data Locality -Communication and Synchronization -Load Imbalance Challenge for new productive parallel programming languages -Express data partitioning and parallelism at a high leve -Still obtain high performance!


Download ppt "09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008."

Similar presentations


Ads by Google