Efficient Visualization and Analysis of Very Large Climate Data Hank Childs, Lawrence Berkeley National Laboratory December 8, 2011 Lawrence Livermore.

Slides:



Advertisements
Similar presentations
S ITE R EPORT : L AWRENCE B ERKELEY N ATIONAL L ABORATORY J OERG M EYER
Advertisements

Hank Childs Lawrence Berkeley National Laboratory /
EUFORIA FP7-INFRASTRUCTURES , Grant JRA4 Overview and plans M. Haefele, E. Sonnendrücker Euforia kick-off meeting 22 January 2008 Gothenburg.
1 Slides presented by Hank Childs at the VACET/SDM workshop at the SDM Center All-Hands Meeting. November 26, 2007 Snoqualmie, Wa Work performed under.
U.S. Department of Energy’s Office of Science Basic Energy Sciences Advisory Committee Dr. Daniel A. Hitchcock October 21, 2003
The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory February 26, B element Rayleigh-Taylor.
Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, B element Rayleigh-Taylor Instability.
The Challenges Ahead for Visualizing and Analyzing Massive Data Sets Hank Childs Lawrence Berkeley National Laboratory & UC Davis February 26, B.
Large Scale Data Visualization with VisIt Hank Childs, Lawrence Berkeley, August 8, 2012 Basic usage Data analysis Derived quantities Scripting Moviemaking.
Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs Lawrence Berkeley National Laboratory / University.
UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.
VisIt Software Engineering Infrastructure and Release Process LLNL-PRES Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Lawrence Livermore National Laboratory Visualization and Analysis Activities May 19, 2009 Hank Childs VisIt Architect Performance Measures x.x, x.x, and.
Nuclear Energy Work Hank Childs & Christoph Garth April 15, 2010.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Challenges and Solutions for Visual Data Analysis on Current and Emerging HPC Platforms Wes Bethel & Hank Childs, Lawrence Berkeley Lab July 20, 2011.
Welcome to HTCondor Week #14 (year #29 for our project)
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
11 4 VisIt is a richly featured, turnkey application VisIt is an open source, end user visualization and analysis tool for simulated and experimental data.
An Extensible Python User Environment Jeff Daily Karen Schuchardt, PI Todd Elsethagen Jared Chase H41G-0956 Website Acknowledgements.
Experiments with Pure Parallelism Hank Childs, Dave Pugmire, Sean Ahern, Brad Whitlock, Mark Howison, Prabhat, Gunther Weber, & Wes Bethel April 13, 2010.
National Center for Supercomputing Applications University of Illinois at Urbana–Champaign Practical HPC Visualization Mark Van Moer Visualization Programmer.
VisIt: a visualization tool for large turbulence simulations  Outline Success stories with turbulent simulations Overview of VisIt project 1 Hank Childs.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Data Intensive Computing at Sandia September 15, 2010 Andy Wilson Senior Member of Technical Staff Data Analysis and Visualization Sandia National Laboratories.
NA-MIC National Alliance for Medical Image Computing Core 1b – Engineering Highlights, Aims and Architecture Will Schroeder Kitware.
Hank Childs, Lawrence Berkeley Lab & UC Davis Workshop on Exascale Data Management, Analysis, and Visualization Houston, TX 2/22/11 Visualization & The.
R. Ryne, NUG mtg: Page 1 High Energy Physics Greenbook Presentation Robert D. Ryne Lawrence Berkeley National Laboratory NERSC User Group Meeting.
VACET: Deploying Technology for Visualizing and Analyzing Astrophysics Simulations Author May 19, 2009.
Nov. 14, 2012 Hank Childs, Lawrence Berkeley Jeremy Meredith, Oak Ridge Pat McCormick, Los Alamos Chris Sewell, Los Alamos Ken Moreland, Sandia Panel at.
Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.
Large Scale Visualization on the Cray XT3 Using ParaView Cray User’s Group 2008 May 8, 2008 Sandia is a multiprogram laboratory operated by Sandia Corporation,
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
SDM Center’s Data Mining & Analysis SDM Center Parallel Statistical Analysis with RScaLAPACK Parallel, Remote & Interactive Visual Analysis with ASPECT.
Commodity Grid Kits Gregor von Laszewski (ANL), Keith Jackson (LBL) Many state-of-the-art scientific applications, such as climate modeling, astrophysics,
VisIt is an open source, richly featured, turn-key application for large data.  Used by:  Visualization experts  Simulation code developers  Simulation.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.
Land Ice Verification and Validation (LIVV) Kit Weak scaling behavior for a large dome- shaped test case. It shows that the scaling behavior of a new run.
E. WES BETHEL (LBNL), CHRIS JOHNSON (UTAH), KEN JOY (UC DAVIS), SEAN AHERN (ORNL), VALERIO PASCUCCI (LLNL), JONATHAN COHEN (LLNL), MARK DUCHAINEAU.
LBNL VACET Activities Hank Childs Computer Systems Engineer - Visualization Group August 24, 2009.
F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
VAPoR: A Discovery Environment for Terascale Scientific Data Sets Alan Norton & John Clyne National Center for Atmospheric Research Scientific Computing.
CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.
Hank Childs, University of Oregon Volume Rendering Primer / Intro to VisIt.
J.-N. Leboeuf V.K. Decyk R.E. Waltz J. Candy W. Dorland Z. Lin S. Parker Y. Chen W.M. Nevins B.I. Cohen A.M. Dimits D. Shumaker W.W. Lee S. Ethier J. Lewandowski.
Hank Childs, University of Oregon Large Data Visualization.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
Presented by Visualization at the Leadership Computing Facility Sean Ahern Scientific Computing Center for Computational Sciences.
National Center for Supercomputing Applications University of Illinois at Urbana–Champaign Visualization Support for XSEDE and Blue Waters DOE Graphics.
HEP and NP SciDAC projects: Key ideas presented in the SciDAC II white papers Robert D. Ryne.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
1 27B element Rayleigh-Taylor Instability (MIRANDA, BG/L) VisIt: a visualization tool for large turbulence simulations Large data requires special techniques.
VisIt : A Tool for Visualizing and Analyzing Very Large Data Hank Childs, Lawrence Berkeley National Laboratory December 13, 2010.
HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Visualization Tools for Nuclear Engineering Data Tom Fogal May 3 rd, 2011.
VisIt Project Overview
VisIt Libsim Update DOE Computer Graphics Forum 2012 Brad Whitlock
VirtualGL.
Ray-Cast Rendering in VTK-m
Scientific Discovery via Visualization Using Accelerated Computing
WHY THE RULES ARE CHANGING FOR LARGE DATA VISUALIZATION AND ANALYSIS
In Situ Fusion Simulation Particle Data Reduction Through Binning
Presentation transcript:

Efficient Visualization and Analysis of Very Large Climate Data Hank Childs, Lawrence Berkeley National Laboratory December 8, 2011 Lawrence Livermore National Laboratory

Problem Statement Climate data is getting increasingly large (massive data!), both spatially and temporally Climate scientists want to process this data in lots of ways: –Analyze simulation recently run on their supercomputer –Download remote data to nearby supercomputer for analysis –Download remote data to desktop computer for analysis –Send analysis routines to remote machine location, results are returned.

VisIt is an open source, richly featured, turn- key application for large data. Used by: –Visualization experts –Simulation code developers –Simulation code consumers Popular –R&D 100 award in 2005 –Used on many of the Top500 –>>>100K downloads Developed by: –NNSA, SciDAC, NEAMS, NSF, and more 217 pin reactor cooling simulation Run on ¼ of Argonne BG/P Image credit: Paul Fischer, ANL 1 billion grid points / time slice

VisIt employs a parallelized client-server architecture. Client-server observations: –Good for remote visualization –Leverages available resources –Scales well –No need to move data remote machine Parallel vis resources User data localhost – Linux, Windows, Mac Graphics Hardware

VisIt recently demonstrated good performance at unprecedented scale. ● Weak scaling study: ~62.5M cells/core 5 #coresProblem Size ModelMachine 8K0.5TIBM P5Purple 16K1TSunRanger 16K1TX86_64Juno 32K2TCray XT5JaguarPF 64K4TBG/PDawn 16K, 32K1T, 2TCray XT4Franklin Two trillion cell data set, rendered in VisIt by David Pugmire on ORNL Jaguar machine VisIt’s data processing techniques are more than scalability at massive concurrency; we are leveraging a suite of techniques developed over the last decade by VACET, the NNSA, and others.

VisIt is used to look at simulated and experimental data from many areas. Fusion, Sanderson, UUtah Particle accelerators, Ruebel, LBNL Astrophysics, Childs Nuclear Reactors, Childs

Problem Statement Climate data is getting increasingly massive, both spatially and temporally Climate scientists want to process this data in lots of ways: –Analyze simulation recently run on their supercomputer –Download remote data to nearby supercomputer for analysis –Download remote data to desktop computer for analysis –Send analysis routines to remote machine location, results are returned. Mission accomplished?

General-purpose tools vs application-specific tools Are made specifically to solve your problem: –Streamlined user interface –Application-specific analysis But, they have smaller user and developer communities, so they are: –Less robust –Less efficient algorithms –Smaller set of features Are developed by large teams, leading to: –Robustness –Efficient algorithms –Rich set of features General-purpose tools: Application-specific tools: But: –They aren’t streamlined for a given application area (lots of button clicks) –They don’t have application-specific methods So which do users want?

Amazing developments over the last decade… Very useful packages now available that make quick tool development possible: –Python, R, VTK, Qt But this doesn’t solve the large data issue. –… but tools now are available that do that as well: VisIt & ParaView Great idea: put all these products together into one tool (VisTrails). Users get: –The robustness, richness, and efficiency of large development efforts –Streamlined user interface & climate-specific analysis (via CDAT) This tool is UV-CDAT.

UV-CDAT UV-CDAT = Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT) Goal: robust tool, capable of doing powerful analysis on large climate data sets. Collaboration between two DOE BER teams One tool that incorporates many packages… –… including “VisIt” UV-CDAT Developers: Dean Williams (PI) Jim Ahrens, Andy Bauer, Dave Bader, Berk Geveci, Timo Bremer, Claudio Silva, David Partyka, Charles Doutriaux, Robert Drach, Emanuele Marques, Feiyi Wang, Jerry Potter, Galen Shipman, John Patchett, Phil Jones, Thomas Maxwell, and many more

Integrated UV-CDAT GUI: Project, Plot, and Variables View

P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code Big data visualization tools use data parallelism to process data. This is a good approach for high resolution meshes … but climate data is often different This is a good approach for high resolution meshes … but climate data is often different

Parallelizing Processing Over Time Slices: Improving Performance For Climate Data Objective Progress & Results Impact VisIt’s parallel processing techniques were designed for single time slices of very high resolution meshes. We must adapt this approach for the lower resolution and high temporal frequency characteristic of climate data. We have modified VisIt’s underlying infrastructure to have a “parallelize over time” processing mode. We implemented a simple algorithm (“maximum value over time”) as a proof of concept Climate scientists with access to parallel resources for their data will be able to process their data significantly faster through parallelization. This software investment will enable other algorithms developed by the project to also be accelerated through parallelization. P0 P1 P2 VisIt parallelizing over a high resolution spatial mesh VisIt parallelizing temporally over a low resolution mesh P0 P1 P2 T=0 T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8 ConcurrencyAverag e Time Speedup s s2.1X s4.0X 862.0s7.8X s16.5X s31.0X 647.8s61.5X s114X Scaling on 2130 time slices of NetCDF climate data (source: Wehner) This is just one aspect of the uniqueness of climate data … many more.

Success story: spatial extreme value analysis We were able to parallelize the spatial extreme value analysis: –Parallelization was both spatial and temporal –Parallelization was done with VisIt infrastructure, but analysis was done with R (via VisIt-R linkage) Ran on CCSM3.0, 100 year, daily precipitation data Near perfect scaling on 36,500 time slices Credit: Dave Pugmire (ORNL), Chris Paciorek (UC Berkeley)

Summary Visualization and analysis of massive data is possible –Other communities have built tools for this purpose –These tools are effective for: Local or remote data Modest to massive parallelization (& serial too!) Interactive or batch processing Our effort focuses on deploying VisIt and R as part of UV- CDAT. –VisIt is excellent with large data and has been adapted to work with climate data. –We also are collaborating on cutting edge analysis techniques with climate scientists

Example climate visualizations with VisIt