The Scalable Data Management, Analysis, and Visualization Institute VTK-m: Accelerating the Visualization Toolkit for Multi-core.

Slides:



Advertisements
Similar presentations
Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
Advertisements

The 7 th Ultrascale Visualization Workshop November 12, 2012 Salt Lake City.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
A 4-year $2.6 million grant from the National Institute of Biomedical Imaging and Bioengineering (NIBIB), to perform “real-time” CT imaging dose calculations.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
 155 South 1452 East Room 380  Salt Lake City, Utah  This research was sponsored by the National Nuclear Security Administration.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.
Multi Agent Simulation and its optimization over parallel architecture using CUDA™ Abdur Rahman and Bilal Khan NEDUET(Department Of Computer and Information.
Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh Faculty Advisors: Computer Science – Prof. Sanjay Rajopadhye Electrical & Computer.
Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Roadmap for Many-core Visualization Software in DOE Jeremy Meredith Oak Ridge National Laboratory.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
SOLAR THERMAL PLANT DESIGN AND OPERATION SUITE OF TOOLS COMPUTATION USING OPENCL Instructor: Dr.Perez Davila 1.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Extracted directly from:
Advisor: Dr. Aamir Shafi Co-Advisor: Mr. Ali Sajjad Member: Dr. Hafiz Farooq Member: Mr. Tahir Azim Optimizing N-body Simulations for Multi-core Compute.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
ISS-AliEn and ISS-gLite Adrian Sevcenco RO-LCG 2011 WORKSHOP Applications of Grid Technology and High Performance Computing in Advanced Research.
Nov. 14, 2012 Hank Childs, Lawrence Berkeley Jeremy Meredith, Oak Ridge Pat McCormick, Los Alamos Chris Sewell, Los Alamos Ken Moreland, Sandia Panel at.
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
GPU Architecture and Programming
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
ARCHES: GPU Ray Tracing I.Motivation – Emergence of Heterogeneous Systems II.Overview and Approach III.Uintah Hybrid CPU/GPU Scheduler IV.Current Uintah.
VTK-m Project Goals A single place for the visualization community to collaborate, contribute, and leverage massively threaded algorithms. Reduce the challenges.
MESQUITE: Mesh Optimization Toolkit Brian Miller, LLNL
 Genetic Algorithms  A class of evolutionary algorithms  Efficiently solves optimization tasks  Potential Applications in many fields  Challenges.
1 COMPUTER SCIENCE DEPARTMENT COLORADO STATE UNIVERSITY 1/9/2008 SAXS Software.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Xolotl: A New Plasma Facing Component Simulator Scott Forest Hull II Jr. Software Developer Oak Ridge National Laboratory
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Add Cool Visualizations Here Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary.
EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Scientific Computing Goals Past progress Future. Goals Numerical algorithms & computational strategies Solve specific set of problems associated with.
OpenMP Runtime Extensions Many core Massively parallel environment Intel® Xeon Phi co-processor Blue Gene/Q MPI Internal Parallelism Optimizing MPI Implementation.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
Adapting the Visualization Toolkit for Many-Core Processors with the VTK-m Library Christopher Sewell (LANL) and Robert Maynard (Kitware) VTK-m Team: LANL:
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
11 Brian Van Straalen Portable Performance Discussion August 7, FASTMath SciDAC Institute.
GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.
Parallel Programming Models
Computer Engg, IIT(BHU)
Conclusions on CS3014 David Gregg Department of Computer Science
VisIt Project Overview
ASCR CS PI Meeting Kenneth Moreland Sandia National Laboratories
Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang
Ray-Cast Rendering in VTK-m
Scientific Discovery via Visualization Using Accelerated Computing
NVIDIA Fermi Architecture
Introduction to Heterogeneous Parallel Computing
Graphics Processing Unit
Multicore and GPU Programming
Portable Performance for Many-Core Particle Advection
In Situ Fusion Simulation Particle Data Reduction Through Binning
Multicore and GPU Programming
Wavelet Compression for In Situ Data Reduction
Presentation transcript:

The Scalable Data Management, Analysis, and Visualization Institute VTK-m: Accelerating the Visualization Toolkit for Multi-core and Many-core Architectures Ken Moreland, Sandia National Laboratory Robert Maynard and Berk Geveci, Kitware Jeremy Meredith and Dave Pugmire, Oak Ridge National Laboratory VTK-m Goals A single place for the visualization community to collaborate, contribute, and leverage massively threaded algorithms Reduce the challenges of writing highly concurrent algorithms by using data parallel algorithms Make it easier for simulation codes to take advantage these parallel visualization and analysis tasks on a wide range of current and next-generation hardware Unify efforts in this area from Sandia (Dax), Oak Ridge (EAVL), and Los Alamos (PISTON) Christopher Sewell, Li-ta Lo, and James Ahrens, Los Alamos National Laboratory Hank Childs and Matt Larsen, University of Oregon Kwan-Liu Ma and Hendrik Schroots, University of California at Davis VTK-m Status Project infrastructure Code repository: Project webpage: Features Core Types Statically Typed Arrays Dynamically Typed Arrays Device Interface (Serial, CUDA, TBB; OpenMP in progress) Field and Topology Worklet and Dispatcher Data Model Allows clients to construct data sets from cell and point arrangements that exactly match their original data In effect, this allows for hybrid and novel mesh types Filters Isosurface for structured grids Statistical filters (histograms, moments, etc.) In development: stream lines, stream surfaces, tetrahedralization Cosmology Applications Halo finding and halo center finding algorithms were written using PISTON, one of VTK-m’s constituent projects On Titan, this enabled centers to be found on the GPU ~50x faster than using the pre-existing algorithms on the CPU (with one rank per node) This work allowed halo analysis to be completed on all time steps of a very large particle data set across 16,384 nodes on Titan for which analysis using the existing CPU algorithms was not feasible The portability of VTK-m allowed us to run the same code on an Intel Xeon Phi This is the first time that the c-M relation has been measured from a single simulation volume over such an extended mass range To appear in the Astrophysical Journal: “The Q Continuum Simulation: Harnessing the Power of GPU Accelerated Supercomputers”. Concentration-mass relation over the full mass range covered by the Q Continuum simulation at redshift z = 0 (points with error bars) and the predictions from various groups. The yellow shaded region shows the intrinsic scatter. All predictions and the simulation results are well within that scatter. VTK-m Data Model Functional programming paradigm VTK-m Isosurface Performance (preliminary results) In-situ Applications Tightly coupled in-situ with EAVL, one of VTK-m’s constituent projects Efficient in-situ visualization and analysis Light weight, zero-dependency library Zero-copy references to host simulation Heterogeneous memory support for accelerators Flexible data model supports non-physical data types Example: scientific and performance visualization, tightly coupled EAVL with SciDAC Xolotl plasma surface simulation Loosely coupled in-situ with EAVL Application de-coupled from visualization using ADIOS and Data Spaces EAVL plug-in reads data from staging nodes System nodes running EAVL perform visualization operations and rendering Example: field and particle data, EAVL in-situ with XGC SciDAC simulation via ADIOS and Data Spaces EAVL in-situ with Xolotl EAVL in-situ with XGC Hardware-Agnostic Ray Tracing VTK-m's hardware-agnostic approach gives comparable performance to hardware-specific approaches Since VTK-m is implemented in a hardware-agnostic way, we wanted to understand the corresponding sacrifice in performance We implemented a ray-traced renderer, which is computationally intensive and uses many unstructured memory accesses We then compared VTK-m's performance to NVIDIA's OptiX and Intel's Embree, two "guaranteed not to exceed" ray-tracing standards that are developed by teams of professionals Our study found that VTK-m performance was always within a factor of two of industry standards, and even outperformed them in some cases We concluded that VTK-m hardware-agnostic approach is viable - our single implementation performed comparably to multiple hardware-specific implementations Advanced Visualization Usability Study Implementation of both ray-casting and cell projection volume rendering algorithms using Dax, one of VTK-m’s constituent projects Complied for CUDA, OpenMP, and Intel’s Thread Building Blocks Comparative performance study on NVIDIA Titan X GPU, Intel Xeon, and Intel Xeon Phi VTK-m implementation in progress Ray-traced rendering of 6.2M triangles generated from SPECFEM3D. The data represents wave speed perturbations measured by seismograms and was provided by Oak Ridge National Laboratory. Acknowledgement This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program under the Institute of Scalable Data Management, Analysis and Visualization (SDAV). Volume rendering of type Ia supernova simulation data set using ray-casting. Cell projection implementation using data parallel primitives renders comparable image in near sub- second times. LA-UR VTK-m infrastructure and use cases, with contributions from Dax, EAVL, and PISTON predecessor projects