HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09.

Slides:

Advertisements

Similar presentations

The Big Picture Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically This has lead to.

Advertisements

Hank Childs Lawrence Berkeley National Laboratory /

1 Slides presented by Hank Childs at the VACET/SDM workshop at the SDM Center All-Hands Meeting. November 26, 2007 Snoqualmie, Wa Work performed under.

EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis April 10, 2011.

EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis July 1, 2011.

Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.

Parallel Research at Illinois Parallel Everywhere

The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory February 26, B element Rayleigh-Taylor.

Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, B element Rayleigh-Taylor Instability.

The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory & UC Davis February 26, B.

Introduction CSCI 444/544 Operating Systems Fall 2008.

MotoHawk Training Model-Based Design of Embedded Systems.

Chapter 15 Application of Computer Simulation and Modeling.

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs Lawrence Berkeley National Laboratory / University.

CIM2564 Introduction to Development Frameworks 1 Overview of a Development Framework Topic 1.

Lawrence Livermore National Laboratory Visualization and Analysis Activities May 19, 2009 Hank Childs VisIt Architect Performance Measures x.x, x.x, and.

Chapter 1 Software Development. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 1-2 Chapter Objectives Discuss the goals of software development.

Chapter 4 Assessing and Understanding Performance

SIMULATION. Simulation Definition of Simulation Simulation Methodology Proposing a New Experiment Considerations When Using Computer Models Types of Simulations.

SDLC. Information Systems Development Terms SDLC - the development method used by most organizations today for large, complex systems Systems Analysts.

Challenges and Solutions for Visual Data Analysis on Current and Emerging HPC Platforms Wes Bethel & Hank Childs, Lawrence Berkeley Lab July 20, 2011.

Software design and development Marcus Hunt. Application and limits of procedural programming Procedural programming is a powerful language, typically.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.

Experiments with Pure Parallelism Hank Childs, Dave Pugmire, Sean Ahern, Brad Whitlock, Mark Howison, Prabhat, Gunther Weber, & Wes Bethel April 13, 2010.

SWE 316: Software Design and Architecture – Dr. Khalid Aljasser Objectives Lecture 11 : Frameworks SWE 316: Software Design and Architecture  To understand.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

VisIt: a visualization tool for large turbulence simulations  Outline Success stories with turbulent simulations Overview of VisIt project 1 Hank Childs.

1 ISA&D7‏/8‏/ ISA&D7‏/8‏/2013 Systems Development Life Cycle Phases and Activities in the SDLC Variations of the SDLC models.

Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Hank Childs, Lawrence Berkeley Lab & UC Davis Workshop on Exascale Data Management, Analysis, and Visualization Houston, TX 2/22/11 Visualization & The.

So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.

VACET: Deploying Technology for Visualizing and Analyzing Astrophysics Simulations Author May 19, 2009.

Nov. 14, 2012 Hank Childs, Lawrence Berkeley Jeremy Meredith, Oak Ridge Pat McCormick, Los Alamos Chris Sewell, Los Alamos Ken Moreland, Sandia Panel at.

OPERATING SYSTEMS Goals of the course Definitions of operating systems Operating system goals What is not an operating system Computer architecture O/S.

Efficient Visualization and Analysis of Very Large Climate Data Hank Childs, Lawrence Berkeley National Laboratory December 8, 2011 Lawrence Livermore.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Personal Computer - Stand- Alone Database  Database (or files) reside on a PC - on the hard disk.  Applications run on the same PC and directly access.

Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.

VisIt is an open source, richly featured, turn-key application for large data.  Used by:  Visualization experts  Simulation code developers  Simulation.

Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.

A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.

EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis April 10, 2011.

1 CMPT 275 High Level Design Phase Modularization.

1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.

1-1 Software Development Objectives: Discuss the goals of software development Identify various aspects of software quality Examine two development life.

MESQUITE: Mesh Optimization Toolkit Brian Miller, LLNL

EXASCALE VISUALIZATION: GET READY FOR A WHOLE NEW WORLD Hank Childs, Lawrence Berkeley Lab & UC Davis April 10, 2011.

CS5103 Software Engineering Lecture 02 More on Software Process Models.

CSE 190p wrapup Michael Ernst CSE 190p University of Washington.

Hank Childs, University of Oregon Volume Rendering Primer / Intro to VisIt.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Lawrence Livermore National Laboratory BRdeS-1 Science & Technology Principal Directorate - Computation Directorate How to Stop Worrying and Learn to Love.

Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.

1 27B element Rayleigh-Taylor Instability (MIRANDA, BG/L) VisIt: a visualization tool for large turbulence simulations Large data requires special techniques.

Tackling I/O Issues 1 David Race 16 March 2010.

VisIt : A Tool for Visualizing and Analyzing Very Large Data Hank Childs, Lawrence Berkeley National Laboratory December 13, 2010.

Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.

BIG DATA/ Hadoop Interview Questions.

Introduction. News you can use Hardware –Multicore chips (2009: mostly 2 cores and 4 cores, but doubling) (cores=processors) –Servers (often.

VisIt Project Overview

Lecture 2: Performance Evaluation

Hiba Tariq School of Engineering

PLM, Document and Workflow Management

VisIt Libsim Update DOE Computer Graphics Forum 2012 Brad Whitlock

Part 3 Design What does design mean in different fields?

WHY THE RULES ARE CHANGING FOR LARGE DATA VISUALIZATION AND ANALYSIS

Presentation transcript:

HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09

Supercomputing 101  Why simulation?  Simulations are sometimes more cost effective than experiments.  New model for science has three legs: theory, experiment, and simulation.  What is the “petascale”?  1 FLOP = 1 FLoating point OPeration per second  1 GigaFLOP = 1 billion FLOPs, 1 TeraFLOP = 1000 GigaFLOPs  1 PetaFLOP = 1,000,000 GigaFLOPs  PetaFLOPs + petabytes on disk + petabytes of memory  petascale  Why petascale?  More compute cycles, more memory, etc, lead for faster and/or more accurate simulations.

Petascale computing is here.  4 existing petascale machines LANL RoadRunner ORNL Jaguar Julich JUGeneUTK Kraken

Supercomputing is not slowing down.  Two ~20 PetaFLOP machines will be online in 2011  Q: When does it stop?  A: Exascale is being actively discussed right now  LLNL SequoiaNCSA BlueWaters

How does the petascale affect visualization? Large # of time steps Large ensembles Large scale Large # of variables

Why is petascale visualization going to change the rules?  Michael Strayer ( U.S. DoE Office of Science ): “petascale is not business as usual”  Especially true for visualization and analysis!  Large scale data creates two incredible challenges: scale and complexity  Scale is not “business as usual”  Supercomputing landscape is changing  Solution: we will need “smart” techniques in production environments  More resolution leads to more and more complexity  Will the “business as usual” techniques still suffice? Outline  What are the software engineering ramifications?

Production visualization tools use “pure parallelism” to process data. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

Pure parallelism: pros and cons  Pros:  Easy to implement  Cons:  Requires large amount of primary memory  Requires large I/O capabilities   requires big machines

Pure parallelism performance is based on # bytes to process and I/O rates. Vis is almost always >50% I/O and sometimes 98% I/O Amount of data to visualize is typically O(total mem)  Relative I/O (ratio of total memory and I/O) is key FLOPs MemoryI/O Terascale machine “Petascale machine”

Anedoctal evidence: relative I/O is getting slower. Machine nameMain memoryI/O rate ASC purple49.0TB140GB/s5.8min BGL-init32.0TB24GB/s22.2min BGL-cur69.0TB30GB/s38.3min Petascale machine ?? >40min Time to write memory to disk

Why is relative I/O getting slower?  “I/O doesn’t pay the bills”  Simulation codes aren’t affected.

Recent runs of trillion cell data sets provide further evidence that I/O dominates 12 ● Weak scaling study: ~62.5M cells/core 12 #coresProblem Size TypeMachine 8K0.5TZ AIX Purple 16K1TZSun LinuxRanger 16K1TZLinuxJuno 32K2TZCray XT5JaguarPF 64K4TZBG/PDawn 16K, 32K1TZ, 2TZCray XT4Franklin 2T cells, 32K procs on Jaguar 2T cells, 32K procs on Franklin -Approx I/O time: 2-5 minutes -Approx processing time: 10 seconds -Approx I/O time: 2-5 minutes -Approx processing time: 10 seconds

Assumptions stated  I/O is a dominant term in visualization performance  Supercomputing centers are procuring “imbalanced” petascale machines  Trend is towards massively multi-core, with lots of shared memory within a node  I/O goes to a node more cores  less I/O bandwidth per core  And: Overall I/O bandwidth is also deficient

Pure parallelism is not well suited for the petascale.  Emerging problem:  Pure parallelism emphasizes I/O and memory  And: pure parallelism is the dominant processing paradigm for production visualization software.  Solution? … there are “smart techniques” that de- emphasize memory and I/O.  Data subsetting  Multi-resolution  Out of core  In situ

Data subsetting eliminates pieces that don’t contribute to the final picture. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

Data Subsetting: pros and cons  Pros:  Less data to process (less I/O, less memory)  Cons:  Extent of optimization is data dependent  Only applicable to some algorithms

Multi-resolution techniques use coarse representations then refine. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code P2 P4

Multi-resolution: pros and cons  Pros  Avoid I/O & memory requirements  Cons  Is it meaningful to process simplified version of the data?

Out-of-core iterates pieces of data through the pipeline one at a time. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

Out-of-core: pros and cons  Pros:  Lower requirement for primary memory  Doesn’t require big machines  Cons:  Still paying large I/O costs (Slow!)

In situ processing does visualization as part of the simulation. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

In situ processing does visualization as part of the simulation. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 GetAccess ToData ProcessRender Processor 0 Parallelized visualization data flow network Parallel Simulation Code GetAccess ToData ProcessRender Processor 1 GetAccess ToData ProcessRender Processor 2 GetAccess ToData ProcessRender Processor 9 …………

In situ: pros and cons  Pros:  No I/O!  Lots of compute power available  Cons:  Very memory constrained  Many operations not possible Once the simulation has advanced, you cannot go back and analyze it  User must know what to look a priori Expensive resource to hold hostage!

Summary of Techniques and Strategies  Pure parallelism can be used for anything, but it takes a lot of resources  Smart techniques can only be used situationally  Petascale strategy 1:  Stick with pure parallelism and live with high machine costs & I/O wait times  Other petascale strategies?  Assumption: We can’t afford massive dedicated clusters for visualization We can fall back on the super computer, but only rarely

Now we know the tools … what problem are we trying to solve?  Three primary use cases:  Exploration  Confirmation  Communication Examples: Scientific discovery Debugging Examples: Scientific discovery Debugging Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Examples: Data analysis Images / movies

Notional decision process Need all data at full resolution? No Multi-resolution (debugging & scientific discovery) Multi-resolution (debugging & scientific discovery) Yes Do operations require all the data? No Data subsetting (comparison & data analysis) Data subsetting (comparison & data analysis) Yes Do you know what you want do a priori? Yes In Situ (data analysis & images / movies) In Situ (data analysis & images / movies) No Do algorithms require all data in memory? No Interactivity required? No Out-of-core (Data analysis & images / movies) Out-of-core (Data analysis & images / movies) Exploration Confirmation Communication Pure parallelism (Anything & esp. comparison) Pure parallelism (Anything & esp. comparison) Yes

Alternate strategy: smart techniques All visualization and analysis work Multi-res In situ Out-of-core Do remaining ~5% on SC Data subsetting

How Petascale Changes the Rules  We can’t use pure parallelism alone any more  We will need algorithms to work in multiple processing paradigms  Incredible research problem…  … but also an incredible software engineering problem.

Data flow networks… a love story 29 File Reader (Source) Slice Filter Contour Filter Renderer (Sink) Update Execute  Work is performed by a pipeline  A pipeline consists of data objects and components (sources, filters, and sinks)  Pipeline execution begins with a “pull”, which starts Update phase  Data flows from component to component during the Execute phase

Data flow networks: strengths  Flexible usage  Networks can be multi-input / multi-output  Interoperability of modules  Embarrassingly parallel algorithms handled by base infrastructure  Easy to extend  New derived types of filters Abstract filter Slice filter Contour filter ???? filter Inheritance Source Sink Filter A Filter B Filter C Flow of data

Data flow networks: weaknesses  Execution of modules happens in stages  Algorithms are executed at one time Cache inefficient  Memory footprint concerns  Some implementations fix the data model

Data flow networks: observations  Majority of code investment is in algorithms (derived types of filters), not in base classes (which manage data flow).  Source code for managing flow of data is small and in one place Algorithms don’t care about data processing paradigm … they only care about operating on inputs and outputs.

Example filter: contouring Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering {

Example filter: contouring with data subsetting Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering { Communicate with executive to discard pieces

Example filter: contouring with out-of-core Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering { Algorithm called 12 times

Example filter: contouring with multi-resolution techniques Contour algorithm Contour filter Mesh input Surface/line output Data Reader Contour Filter Rendering {

Simulation code Example filter: contouring with in situ Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering { X For each example, the contour algorithm didn’t change, just its context.

How big is this job?  Many algorithms are basically “processing paradigm” indifferent  What percentage of a vis code is algorithms?  What percentage is devoted to the “processing paradigm”?  Other? We can gain insight by looking at the breakdown in a real world example (VisIt).

VisIt is a richly featured, turnkey application for large data.  Tool has two focal points: big data & providing a product for end users.  VisIt is an open source, end user visualization and analysis tool for simulated and experimental data  >100K downloads on web  R&D 100 award in 2005  Used “heavily to exclusively” on 8 of world’s top 12 supercomputers  Pure parallelism + out-of-core + data subsetting + in situ 27B element Rayleigh-Taylor Instability (MIRANDA, BG/L)

VisIt architecture & lines of code Client side Server side viewer gui cli mdserver engine (parallel and serial) engine (parallel and serial)  + custom interfaces  + documentation  + regression testing  + user knowledge  + Wiki  + mailing list archives 154K 70K 103K 14K 29K Plots 43K Operators 55K 34K 33K Databases 192K Handling large data, parallel algorithms 32K / 559K Support libraries & tools 178K libsim 10K Pure parallelism is the simplest paradigm. “Replacement” code may be significantly larger.

Summary  Petascale machines are not well suited for pure parallelism, because of its high I/O and memory costs.  This will force production visualization software to utilize more processing paradigms.  The majority of existing investments can be preserved.  This is thanks in large part to the elegant design of data flow networks.  Hank Childs,  … and questions???