Presentation is loading. Please wait.

Presentation is loading. Please wait.

HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09.

Similar presentations


Presentation on theme: "HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09."— Presentation transcript:

1 HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09

2 Supercomputing 101  Why simulation?  Simulations are sometimes more cost effective than experiments.  New model for science has three legs: theory, experiment, and simulation.  What is the “petascale”?  1 FLOP = 1 FLoating point OPeration per second  1 GigaFLOP = 1 billion FLOPs, 1 TeraFLOP = 1000 GigaFLOPs  1 PetaFLOP = 1,000,000 GigaFLOPs  PetaFLOPs + petabytes on disk + petabytes of memory  petascale  Why petascale?  More compute cycles, more memory, etc, lead for faster and/or more accurate simulations.

3 Petascale computing is here.  4 existing petascale machines LANL RoadRunner ORNL Jaguar Julich JUGeneUTK Kraken

4 Supercomputing is not slowing down.  Two ~20 PetaFLOP machines will be online in 2011  Q: When does it stop?  A: Exascale is being actively discussed right now  http://www.exascale.org LLNL SequoiaNCSA BlueWaters

5 How does the petascale affect visualization? Large # of time steps Large ensembles Large scale Large # of variables

6 Why is petascale visualization going to change the rules?  Michael Strayer ( U.S. DoE Office of Science ): “petascale is not business as usual”  Especially true for visualization and analysis!  Large scale data creates two incredible challenges: scale and complexity  Scale is not “business as usual”  Supercomputing landscape is changing  Solution: we will need “smart” techniques in production environments  More resolution leads to more and more complexity  Will the “business as usual” techniques still suffice? Outline  What are the software engineering ramifications?

7 Production visualization tools use “pure parallelism” to process data. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

8 Pure parallelism: pros and cons  Pros:  Easy to implement  Cons:  Requires large amount of primary memory  Requires large I/O capabilities   requires big machines

9 Pure parallelism performance is based on # bytes to process and I/O rates. Vis is almost always >50% I/O and sometimes 98% I/O Amount of data to visualize is typically O(total mem)  Relative I/O (ratio of total memory and I/O) is key FLOPs MemoryI/O Terascale machine “Petascale machine”

10 Anedoctal evidence: relative I/O is getting slower. Machine nameMain memoryI/O rate ASC purple49.0TB140GB/s5.8min BGL-init32.0TB24GB/s22.2min BGL-cur69.0TB30GB/s38.3min Petascale machine ?? >40min Time to write memory to disk

11 Why is relative I/O getting slower?  “I/O doesn’t pay the bills”  Simulation codes aren’t affected.

12 Recent runs of trillion cell data sets provide further evidence that I/O dominates 12 ● Weak scaling study: ~62.5M cells/core 12 #coresProblem Size TypeMachine 8K0.5TZ AIX Purple 16K1TZSun LinuxRanger 16K1TZLinuxJuno 32K2TZCray XT5JaguarPF 64K4TZBG/PDawn 16K, 32K1TZ, 2TZCray XT4Franklin 2T cells, 32K procs on Jaguar 2T cells, 32K procs on Franklin -Approx I/O time: 2-5 minutes -Approx processing time: 10 seconds -Approx I/O time: 2-5 minutes -Approx processing time: 10 seconds

13 Assumptions stated  I/O is a dominant term in visualization performance  Supercomputing centers are procuring “imbalanced” petascale machines  Trend is towards massively multi-core, with lots of shared memory within a node  I/O goes to a node more cores  less I/O bandwidth per core  And: Overall I/O bandwidth is also deficient

14 Pure parallelism is not well suited for the petascale.  Emerging problem:  Pure parallelism emphasizes I/O and memory  And: pure parallelism is the dominant processing paradigm for production visualization software.  Solution? … there are “smart techniques” that de- emphasize memory and I/O.  Data subsetting  Multi-resolution  Out of core  In situ

15 Data subsetting eliminates pieces that don’t contribute to the final picture. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

16 Data Subsetting: pros and cons  Pros:  Less data to process (less I/O, less memory)  Cons:  Extent of optimization is data dependent  Only applicable to some algorithms

17 Multi-resolution techniques use coarse representations then refine. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code P2 P4

18 Multi-resolution: pros and cons  Pros  Avoid I/O & memory requirements  Cons  Is it meaningful to process simplified version of the data?

19 Out-of-core iterates pieces of data through the pipeline one at a time. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

20 Out-of-core: pros and cons  Pros:  Lower requirement for primary memory  Doesn’t require big machines  Cons:  Still paying large I/O costs (Slow!)

21 In situ processing does visualization as part of the simulation. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

22 In situ processing does visualization as part of the simulation. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 GetAccess ToData ProcessRender Processor 0 Parallelized visualization data flow network Parallel Simulation Code GetAccess ToData ProcessRender Processor 1 GetAccess ToData ProcessRender Processor 2 GetAccess ToData ProcessRender Processor 9 …………

23 In situ: pros and cons  Pros:  No I/O!  Lots of compute power available  Cons:  Very memory constrained  Many operations not possible Once the simulation has advanced, you cannot go back and analyze it  User must know what to look a priori Expensive resource to hold hostage!

24 Summary of Techniques and Strategies  Pure parallelism can be used for anything, but it takes a lot of resources  Smart techniques can only be used situationally  Petascale strategy 1:  Stick with pure parallelism and live with high machine costs & I/O wait times  Other petascale strategies?  Assumption: We can’t afford massive dedicated clusters for visualization We can fall back on the super computer, but only rarely

25 Now we know the tools … what problem are we trying to solve?  Three primary use cases:  Exploration  Confirmation  Communication Examples: Scientific discovery Debugging Examples: Scientific discovery Debugging Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Examples: Data analysis Images / movies

26 Notional decision process Need all data at full resolution? No Multi-resolution (debugging & scientific discovery) Multi-resolution (debugging & scientific discovery) Yes Do operations require all the data? No Data subsetting (comparison & data analysis) Data subsetting (comparison & data analysis) Yes Do you know what you want do a priori? Yes In Situ (data analysis & images / movies) In Situ (data analysis & images / movies) No Do algorithms require all data in memory? No Interactivity required? No Out-of-core (Data analysis & images / movies) Out-of-core (Data analysis & images / movies) Exploration Confirmation Communication Pure parallelism (Anything & esp. comparison) Pure parallelism (Anything & esp. comparison) Yes

27 Alternate strategy: smart techniques All visualization and analysis work Multi-res In situ Out-of-core Do remaining ~5% on SC Data subsetting

28 How Petascale Changes the Rules  We can’t use pure parallelism alone any more  We will need algorithms to work in multiple processing paradigms  Incredible research problem…  … but also an incredible software engineering problem.

29 Data flow networks… a love story 29 File Reader (Source) Slice Filter Contour Filter Renderer (Sink) Update Execute  Work is performed by a pipeline  A pipeline consists of data objects and components (sources, filters, and sinks)  Pipeline execution begins with a “pull”, which starts Update phase  Data flows from component to component during the Execute phase

30 Data flow networks: strengths  Flexible usage  Networks can be multi-input / multi-output  Interoperability of modules  Embarrassingly parallel algorithms handled by base infrastructure  Easy to extend  New derived types of filters Abstract filter Slice filter Contour filter ???? filter Inheritance Source Sink Filter A Filter B Filter C Flow of data

31 Data flow networks: weaknesses  Execution of modules happens in stages  Algorithms are executed at one time Cache inefficient  Memory footprint concerns  Some implementations fix the data model

32 Data flow networks: observations  Majority of code investment is in algorithms (derived types of filters), not in base classes (which manage data flow).  Source code for managing flow of data is small and in one place Algorithms don’t care about data processing paradigm … they only care about operating on inputs and outputs.

33 Example filter: contouring Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering {

34 Example filter: contouring with data subsetting Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering { Communicate with executive to discard pieces

35 Example filter: contouring with out-of-core Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering { 123 456 789 101112 123 456 789 101112 Algorithm called 12 times

36 Example filter: contouring with multi-resolution techniques Contour algorithm Contour filter Mesh input Surface/line output Data Reader Contour Filter Rendering {

37 Simulation code Example filter: contouring with in situ Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering { X For each example, the contour algorithm didn’t change, just its context.

38 How big is this job?  Many algorithms are basically “processing paradigm” indifferent  What percentage of a vis code is algorithms?  What percentage is devoted to the “processing paradigm”?  Other? We can gain insight by looking at the breakdown in a real world example (VisIt).

39 VisIt is a richly featured, turnkey application for large data.  Tool has two focal points: big data & providing a product for end users.  VisIt is an open source, end user visualization and analysis tool for simulated and experimental data  >100K downloads on web  R&D 100 award in 2005  Used “heavily to exclusively” on 8 of world’s top 12 supercomputers  Pure parallelism + out-of-core + data subsetting + in situ 27B element Rayleigh-Taylor Instability (MIRANDA, BG/L)

40 VisIt architecture & lines of code Client side Server side viewer gui cli mdserver engine (parallel and serial) engine (parallel and serial)  + custom interfaces  + documentation  + regression testing  + user knowledge  + Wiki  + mailing list archives 154K 70K 103K 14K 29K Plots 43K Operators 55K 34K 33K Databases 192K Handling large data, parallel algorithms 32K / 559K Support libraries & tools 178K libsim 10K Pure parallelism is the simplest paradigm. “Replacement” code may be significantly larger.

41 Summary  Petascale machines are not well suited for pure parallelism, because of its high I/O and memory costs.  This will force production visualization software to utilize more processing paradigms.  The majority of existing investments can be preserved.  This is thanks in large part to the elegant design of data flow networks.  Hank Childs, hchilds@lbl.gov  … and questions???


Download ppt "HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09."

Similar presentations


Ads by Google