Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Slides:



Advertisements
Similar presentations
S ITE R EPORT : L AWRENCE B ERKELEY N ATIONAL L ABORATORY J OERG M EYER
Advertisements

Hank Childs Lawrence Berkeley National Laboratory /
EUFORIA FP7-INFRASTRUCTURES , Grant JRA4 Overview and plans M. Haefele, E. Sonnendrücker Euforia kick-off meeting 22 January 2008 Gothenburg.
1 Slides presented by Hank Childs at the VACET/SDM workshop at the SDM Center All-Hands Meeting. November 26, 2007 Snoqualmie, Wa Work performed under.
EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis July 1, 2011.
The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory February 26, B element Rayleigh-Taylor.
CompSci Applets & Video Games. CompSci Applets & Video Games The Plan  Applets  Demo on making and running a simple applet from scratch.
The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory & UC Davis February 26, B.
Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs Lawrence Berkeley National Laboratory / University.
Summary Role of Software (1 slide) ARCS Software Architecture (4 slides) SNS -- Caltech Interactions (3 slides)
Deploying a Petascale-Capable Visualization and Analysis Tool April 15, 2010.
Lawrence Livermore National Laboratory Visualization and Analysis Activities May 19, 2009 Hank Childs VisIt Architect Performance Measures x.x, x.x, and.
Rockville, MD 28 April 2009 Rockville, MD 28 April 2009 Answers to Review Panel Questions.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Chapter 4 Assessing and Understanding Performance
Challenges and Solutions for Visual Data Analysis on Current and Emerging HPC Platforms Wes Bethel & Hank Childs, Lawrence Berkeley Lab July 20, 2011.
Operating Systems Chapter 4.
11 4 VisIt is a richly featured, turnkey application VisIt is an open source, end user visualization and analysis tool for simulated and experimental data.
Applets & Video Games 1 Last Edited 1/10/04CPS4: Java for Video Games Applets &
Experiments with Pure Parallelism Hank Childs, Dave Pugmire, Sean Ahern, Brad Whitlock, Mark Howison, Prabhat, Gunther Weber, & Wes Bethel April 13, 2010.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
VisIt: a visualization tool for large turbulence simulations  Outline Success stories with turbulent simulations Overview of VisIt project 1 Hank Childs.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Hank Childs, Lawrence Berkeley Lab & UC Davis Workshop on Exascale Data Management, Analysis, and Visualization Houston, TX 2/22/11 Visualization & The.
R. Ryne, NUG mtg: Page 1 High Energy Physics Greenbook Presentation Robert D. Ryne Lawrence Berkeley National Laboratory NERSC User Group Meeting.
Software Development Software Testing. Testing Definitions There are many tests going under various names. The following is a general list to get a feel.
1.8History of Java Java –Based on C and C++ –Originally developed in early 1991 for intelligent consumer electronic devices Market did not develop, project.
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
VACET: Deploying Technology for Visualizing and Analyzing Astrophysics Simulations Author May 19, 2009.
Nov. 14, 2012 Hank Childs, Lawrence Berkeley Jeremy Meredith, Oak Ridge Pat McCormick, Los Alamos Chris Sewell, Los Alamos Ken Moreland, Sandia Panel at.
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
Efficient Visualization and Analysis of Very Large Climate Data Hank Childs, Lawrence Berkeley National Laboratory December 8, 2011 Lawrence Livermore.
Assessing the Frequency of Empirical Evaluation in Software Modeling Research Workshop on Experiences and Empirical Studies in Software Modelling (EESSMod)
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 NERSC Visualization Greenbook Workshop Report June 2002 Wes Bethel LBNL.
Mark Rast Laboratory for Atmospheric and Space Physics Department of Astrophysical and Planetary Sciences University of Colorado, Boulder Kiepenheuer-Institut.
Commodity Grid Kits Gregor von Laszewski (ANL), Keith Jackson (LBL) Many state-of-the-art scientific applications, such as climate modeling, astrophysics,
DISTRIBUTED COMPUTING. Computing? Computing is usually defined as the activity of using and improving computer technology, computer hardware and software.
VisIt is an open source, richly featured, turn-key application for large data.  Used by:  Visualization experts  Simulation code developers  Simulation.
Experts in numerical algorithms and High Performance Computing services Challenges of the exponential increase in data Andrew Jones March 2010 SOS14.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
LBNL VACET Activities Hank Childs Computer Systems Engineer - Visualization Group August 24, 2009.
VAPoR: A Discovery Environment for Terascale Scientific Data Sets Alan Norton & John Clyne National Center for Atmospheric Research Scientific Computing.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis April 10, 2011.
Center for Computational Visualization University of Texas, Austin Visualization and Graphics Research Group University of California, Davis Molecular.
Hank Childs, University of Oregon Volume Rendering Primer / Intro to VisIt.
Hank Childs, University of Oregon Large Data Visualization.
Visualization with ParaView. Before we begin… Make sure you have ParaView 3.14 installed so you can follow along in the lab section –
Presented by Visualization at the Leadership Computing Facility Sean Ahern Scientific Computing Center for Computational Sciences.
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
1 27B element Rayleigh-Taylor Instability (MIRANDA, BG/L) VisIt: a visualization tool for large turbulence simulations Large data requires special techniques.
VisIt : A Tool for Visualizing and Analyzing Very Large Data Hank Childs, Lawrence Berkeley National Laboratory December 13, 2010.
HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
ENEA GRID & JPNM WEB PORTAL to create a collaborative development environment Dr. Simonetta Pagnutti JPNM – SP4 Meeting Edinburgh – June 3rd, 2013 Italian.
Visualization Tools for Nuclear Engineering Data Tom Fogal May 3 rd, 2011.
VisIt Project Overview
VisIt 2.0 Features Brad Whitlock.
VisIt Libsim Update DOE Computer Graphics Forum 2012 Brad Whitlock
Chapter 2: System Structures
In-situ Visualization using VisIt
Introduction to Operating System (OS)
University of Technology
Ray-Cast Rendering in VTK-m
WHY THE RULES ARE CHANGING FOR LARGE DATA VISUALIZATION AND ANALYSIS
DSS Architecture MBA 572 Craig K. Tyran Fall 2002.
In Situ Fusion Simulation Particle Data Reduction Through Binning
Presentation transcript:

Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, B element Rayleigh-Taylor Instability (MIRANDA, BG/L) 2 trillion element mesh

How does the {peta-, exa-} scale affect visualization? Large # of time steps Large ensembles High-res meshes Large # of variables Your mileage may vary -Are you running full machine? -How much data do you output? Your mileage may vary -Are you running full machine? -How much data do you output?

The soon-to-be “good ole days” … how visualization is done right now P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code This technique is called “pure parallelism”

Pure parallelism performance is based on # bytes to process and I/O rates. Vis is almost always >50% I/O and sometimes 98% I/O Amount of data to visualize is typically O(total mem) FLOPs MemoryI/O Terascale machine “Petascale machine” Two big factors: ① how much data you have to read ② how fast you can read it  Relative I/O (ratio of total memory and I/O) is key

Anedoctal evidence: relative I/O really is getting slower. Machine nameMain memoryI/O rate ASC purple49.0TB140GB/s5.8min BGL-init32.0TB24GB/s22.2min BGL-cur69.0TB30GB/s38.3min Petascale machine ?? >40min Time to write memory to disk

Why is relative I/O getting slower? “I/O doesn’t pay the bills” —And I/O is becoming a dominant cost in the overall supercomputer procurement. Simulation codes aren’t as exposed. —And will be less exposed with proposed future architectures.

Recent runs of trillion cell data sets provide further evidence that I/O dominates 7 ● Weak scaling study: ~62.5M cells/core 7 #coresProblem Size TypeMachine 8K0.5TZ AIX Purple 16K1TZSun LinuxRanger 16K1TZLinuxJuno 32K2TZCray XT5JaguarPF 64K4TZBG/PDawn 16K, 32K1TZ, 2TZCray XT4Franklin 2T cells, 32K procs on Jaguar 2T cells, 32K procs on Franklin -Approx I/O time: 2-5 minutes -Approx processing time: 10 seconds -Approx I/O time: 2-5 minutes -Approx processing time: 10 seconds

Summary: what are the challenges? Scale —We can’t read all of the data at full resolution any more… Q: What can we do? A: We need algorithmic change. Insight —How are we going to understand it? (There is a lot more data than pixels!)

Multi-resolution techniques use coarse representations then refine. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code P2 P4

Multi-resolution: pros and cons Pros —Drastically reduce I/O & memory requirements Cons —Is it meaningful to process simplified version of the data? —How do we generate hierarchical representations? What costs do they incur?

In situ processing does visualization as part of the simulation. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

In situ processing does visualization as part of the simulation. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 GetAccess ToData ProcessRender Processor 0 Parallelized visualization data flow network Parallel Simulation Code GetAccess ToData ProcessRender Processor 1 GetAccess ToData ProcessRender Processor 2 GetAccess ToData ProcessRender Processor 9 …………

In situ: pros and cons Pros: —No I/O! —Lots of compute power available Cons: —Very memory constrained —Many operations not possible Once the simulation has advanced, you cannot go back and analyze it —User must know what to look a priori Expensive resource to hold hostage!

Now we know the tools … what problem are we trying to solve? Three primary use cases: —Exploration —Confirmation —Communication Examples: Scientific discovery Debugging Examples: Scientific discovery Debugging Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Examples: Data analysis Images / movies Multi-res In situ Will still need pure parallelism and possibly other techniques such as data subsetting and streaming.

Prepare for difficult conversations in the future. Multi-resolution: —Do you understand what a multi-resolution hierarchy should look like for your data? —Who do you trust to generate it? —Are you comfortable with your I/O routines generating these hierarchies while they write? —How much overhead are you willing to tolerate on your dumps? 33+%? —Willing to accept that your visualizations are not the “real” data?

Prepare for difficult conversations in the future. In situ: —How much memory are you willing to give up for visualization? —Will you be angry if the vis algorithms crash? —Do you know what you want to generate a priori? Can you re-run simulations if necessary?

Visualization on BlueWaters: Two Scenarios 1)Pure parallelism continues (no SW change): —Visualization and analysis will be done using large portions of BW, users charging against science allocations Lots of time spent doing I/O This increases overall I/O contention on BW —Vis & analysis is slow  people do less (  insights will be lost (?)) 2)Smart techniques deployed: —Allocations are used for simulation, not vis & analysis —Less artificial I/O contention introduced —Ability to explore / interact with data

VisIt is a richly featured, turnkey application VisIt is an open source, end user visualization and analysis tool for simulated and experimental data – Used by: physicists, engineers, code developers, vis experts – >100K downloads on web R&D 100 award in 2005 Used “heavily to exclusively” on 8 of world’s top 12 supercomputers 217 pin reactor cooling simulation. Run on ¼ of Argonne BG/P. 1 billion grid points

19 19 Terribly Named!! Intended for more than just visualization ! Data Exploration Visual Debugging Quantitative Analysis Presentations = ? Comparative Analysis

20 20 VisIt has a rich feature set that can impact many science areas. Meshes: rectilinear, curvilinear, unstructured, point, AMR Data: scalar, vector, tensor, material, species Dimension: 1D, 2D, 3D, time varying Rendering (~15): pseudocolor, volume rendering, hedgehogs, glyphs, mesh lines, etc… Data manipulation (~40): slicing, contouring, clipping, thresholding, restrict to box, reflect, project, revolve, … File formats (~85) Derived quantities: >100 interoperable building blocks +,-,*,/, gradient, mesh quality, if-then-else, and, or, not Many general features: position lights, make movie, etc Queries (~50): ways to pull out quantitative information, debugging, comparative analysis

21 21 VisIt employs a parallelized client-server architecture. Client-server observations: —Good for remote visualization —Leverages available resources —Scales well —No need to move data  Additional design considerations: Plugins Multiple UIs: GUI (Qt), CLI (Python), more… remote machine Parallel vis resources User data localhost – Linux, Windows, Mac Graphics Hardware

22 22 The VisIt team focuses on making a robust, usable product for end users. Manuals —300 page user manual —200 page command line interface manual —“Getting your data into VisIt” manual Wiki for users (and developers) Revision control, nightly regression testing, etc Executables for all major platforms Day long class, complete with exercises Slides from the VisIt class

23 23 VisIt is a vibrant project with many participants. Over 50 person-years of effort Over one million lines of code Partnership between: Department of Energy’s Office of Nuclear Energy, Office of Science, and National Nuclear Security Agency, and among others —NSF XD centers both expected to make large contributions User community grows, including AWE & ASC Alliance schools Fall ‘06 VACET is funded Spring ‘08 AWE enters repo 2003 LLNL user community transitioned to VisIt R&D SciDAC Outreach Center enables Public SW repo 2007 Saudi Aramco funds LLNL to support VisIt Spring ‘07 GNEP funds LLNL to support GNEP codes at Argonne Summer‘07 Developers from LLNL, LBL, & ORNL Start dev in repo ‘07-’08 UC Davis & UUtah research done in VisIt repo 2000 Project started ‘07-’08 Partnership with CEA is developed 2008 Institutional support leverages effort from many labs Spring ‘09 More developers Entering repo all the time

24 VisIt: What’s the Big Deal? Everything works at scale Robust, usable tool Vis to code development to scientific insight 24

VisIt and the smart data techiques Full pure parallelism implementation Data subsetting well integrated —(if you set up your data properly) In situ: yes, but … it’s a memory hog Multi-res: emerging effort in this space

26 Three Ways To Get Data Into VisIt (1) Write to a known output format (2) Write a plugin file format reader (3) Integrate VisIt “in situ” —“lib-VisIt” is linked into simulation code (Note: Memory footprint issues with implementation!) —Use model: simulation code advances at some time interval (e.g. end of cycle), hands control to lib-VisIt. lib-VisIt performs vis & analysis tasks, then hands control back to simulation code repeat 26

Summary Massive data will force algorithmic change in visualization tools. The VisIt project is making progress on bringing these algorithmic changes to the user community. Contact info: —Hank Childs, LBL & UC Davis / — Questions??