Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory February 26, 2010 27B element Rayleigh-Taylor.

Similar presentations


Presentation on theme: "The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory February 26, 2010 27B element Rayleigh-Taylor."— Presentation transcript:

1 The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory February 26, 2010 27B element Rayleigh-Taylor Instability (MIRANDA, BG/L) 2 trillion element mesh 2 billion element Thermal hydraulics (Nek5000, BG/P)

2 Overview of This Mini-Symposium Peterka: we can visualize the results on the supercomputer itself Bremer: we can understand and gain insight from these massive data sets Childs: visualization and analysis will be a crucial problem on the next generation of supercomputers Pugmire: we can make our algorithms work at massive scale

3 How does the {peta-, exa-} scale affect visualization? Large # of time steps Large ensembles High-res meshes Large # of variables

4 The soon-to-be “good ole days” … how visualization is done right now* P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code * = Your mileage may vary -Are you running full machine? -How much data do you output? * = Your mileage may vary -Are you running full machine? -How much data do you output?

5 Pure parallelism performance is based on # bytes to process and I/O rates. Vis is almost always >50% I/O and sometimes 98% I/O Amount of data to visualize is typically O(total mem)  Relative I/O (ratio of total memory and I/O) is key FLOPs MemoryI/O Terascale machine “Petascale machine”

6 Anedoctal evidence: relative I/O is getting slower. Machine nameMain memoryI/O rate ASC purple49.0TB140GB/s5.8min BGL-init32.0TB24GB/s22.2min BGL-cur69.0TB30GB/s38.3min Petascale machine ?? >40min Time to write memory to disk

7 Why is relative I/O getting slower? “I/O doesn’t pay the bills” —And I/O is becoming a dominant cost in the overall supercomputer procurement. Simulation codes aren’t as exposed.

8 Recent runs of trillion cell data sets provide further evidence that I/O dominates 8 ● Weak scaling study: ~62.5M cells/core 8 #coresProblem Size TypeMachine 8K0.5TZ AIX Purple 16K1TZSun LinuxRanger 16K1TZLinuxJuno 32K2TZCray XT5JaguarPF 64K4TZBG/PDawn 16K, 32K1TZ, 2TZCray XT4Franklin 2T cells, 32K procs on Jaguar 2T cells, 32K procs on Franklin -Approx I/O time: 2-5 minutes -Approx processing time: 10 seconds -Approx I/O time: 2-5 minutes -Approx processing time: 10 seconds

9 Visualization works because it uses the brain’s highly effective visual processing system. Trillions of data points Millions of pixels But is this still a good idea at the peta-/exascale? (Note that visualization is often reducing the data … so we are frequently *not* trying to render all of the data points.)

10 Visualization works because it uses the brain’s highly effective visual processing system. Trillions of data points One idea: add more pixels! 35M pixel powerwall Bonus: big displays act as collaboration centers.

11 Visualization works because it uses the brain’s highly effective visual processing system. Trillions of data points One idea: add more pixels! 35M pixel powerwall Source: Sawant & Healey, NC State Visual acuity of the human eye is <30M pixels!!

12 Summary: what are the challenges? Scale —We can’t read all of the data at full resolution any more? What can we do? Insight —There is a lot more data than pixels. How are we going to understand it?

13 How can we deal with so many cells per pixel? What should the color of this pixel be? —“Random” between the 9 colors? —An average value of the 9 colors? (brown) —The color of the minimum value? —The color of the maximum value? We need infrastructure to allow users to have confidence in the pictures we deliver. A single pixel Data insight often goes far beyond pictures (see Bremer talk)

14 Multi-resolution techniques use coarse representations then refine. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code P2 P4

15 Multi-resolution: pros and cons Summary: —“Dive” into data Enough diving results in original data Pros —Avoid I/O & memory requirements —Confidence in pictures; multi-res hierarchy addresses “many cells to one pixel issue” Cons —Is it meaningful to process simplified version of the data? —How do we generate hierarchical representations? What costs do they incur?

16 In situ processing does visualization as part of the simulation. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

17 In situ processing does visualization as part of the simulation. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 GetAccess ToData ProcessRender Processor 0 Parallelized visualization data flow network Parallel Simulation Code GetAccess ToData ProcessRender Processor 1 GetAccess ToData ProcessRender Processor 2 GetAccess ToData ProcessRender Processor 9 …………

18 In situ: pros and cons Pros: —No I/O! —Lots of compute power available Cons: —Very memory constrained —Many operations not possible Once the simulation has advanced, you cannot go back and analyze it —User must know what to look a priori Expensive resource to hold hostage!

19 Now we know the tools … what problem are we trying to solve? Three primary use cases: —Exploration —Confirmation —Communication Examples: Scientific discovery Debugging Examples: Scientific discovery Debugging Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Examples: Data analysis Images / movies

20 Notional decision process Need all data at full resolution? No Multi-resolution (debugging & scientific discovery) Multi-resolution (debugging & scientific discovery) Yes Do you know what you want do a priori? Yes In Situ (data analysis & images / movies) In Situ (data analysis & images / movies) Exploration Confirmation Communication Pure parallelism (Anything & esp. comparison) Pure parallelism (Anything & esp. comparison) No Also roles for more minor techniques that weren’t discussed such as streaming and data subsetting.

21 Prepare for difficult conversations in the future. Multi-resolution: —Do you understand what a multi-resolution hierarchy should look like for your data? —Who do you trust to generate it? —Are you comfortable with your I/O routines generating these hierarchies while they write? —How much overhead are you willing to tolerate on your dumps? 33+%? —Willing to accept that your visualizations are not the “real” data?

22 Prepare for difficult conversations in the future. In situ: —How much memory are you willing to give up for visualization? —Will you be angry if the vis algorithms crash? —Do you know what you want to generate a priori? Can you re-run simulations if necessary?

23 Summary Is there a problem with massive data? —Yes, I/O is a major problem —Yes, obtaining insight is a major problem Why is there a problem? Who’s fault is it? —As we scale up, some things get cheap, others things (like I/O) stay expensive What can we do about it? —Multi-res / in-situ Will it hurt? —Yes. Can we do it? —Yes, see next three talks

24 Questions??? Hank Childs, LBL & UC Davis Contact info: —hchilds@lbl.gov / childs@cs.ucdavis.eduhchilds@lbl.govchilds@cs.ucdavis.edu —http://vis.lbl.gov/~hrchilds


Download ppt "The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory February 26, 2010 27B element Rayleigh-Taylor."

Similar presentations


Ads by Google