VTK-m Project Goals A single place for the visualization community to collaborate, contribute, and leverage massively threaded algorithms. Reduce the challenges.

Slides:

Advertisements

Similar presentations

GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.

Advertisements

GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.

ECE 562 Computer Architecture and Design Project: Improving Feature Extraction Using SIFT on GPU Rodrigo Savage, Wo-Tak Wu.

Multi Agent Simulation and its optimization over parallel architecture using CUDA™ Abdur Rahman and Bilal Khan NEDUET(Department Of Computer and Information.

DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.

Encapsulation by Subprograms and Type Definitions

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

Synergistic Execution of Stream Programs on Multicores with Accelerators Abhishek Udupa et. al. Indian Institute of Science.

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.

Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.

Roadmap for Many-core Visualization Software in DOE Jeremy Meredith Oak Ridge National Laboratory.

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

Dax: Rethinking Visualization Frameworks for Extreme-Scale Computing DOECGF 2011 April 28, 2011 Kenneth Moreland Sandia National Laboratories SAND P.

9/13/20151 Threads ICS 240: Operating Systems –William Albritton Information and Computer Sciences Department at Leeward Community College –Original slides.

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

Challenges Bit-vector approach Conclusion & Future Work A subsequence of a string of symbols is derived from the original string by deleting some elements.

1 Hardware Support for Collective Memory Transfers in Stencil Computations George Michelogiannakis, John Shalf Computer Architecture Laboratory Lawrence.

Add Cool Visualizations Here Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary.

Extracted directly from:

COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,

Nov. 14, 2012 Hank Childs, Lawrence Berkeley Jeremy Meredith, Oak Ridge Pat McCormick, Los Alamos Chris Sewell, Los Alamos Ken Moreland, Sandia Panel at.

1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.

Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 King Mongkut's Institute of Technology Ladkrabang.

The Scalable Data Management, Analysis, and Visualization Institute VTK-m: Accelerating the Visualization Toolkit for Multi-core.

Automatic translation from CUDA to C++ Luca Atzori, Vincenzo Innocente, Felice Pantaleo, Danilo Piparo 31 August, 2015.

GPU Architecture and Programming

Introducing collaboration members – Korea University (KU) ALICE TPC online tracking algorithm on a GPU Computing Platforms – GPU Computing Platforms Joohyung.

Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.

Pipes & Filters Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.

OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.

Add Cool Visualizations Here Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary.

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.

Automating and Optimizing Data Transfers for Many-core Coprocessors Student: Bin Ren, Advisor: Gagan Agrawal, NEC Intern Mentor: Nishkam Ravi, Yi Yang.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

 Genetic Algorithms  A class of evolutionary algorithms  Efficiently solves optimization tasks  Potential Applications in many fields  Challenges.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,

Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.

Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.

Geant4 on GPU prototype Nicholas Henderson (Stanford Univ. / ICME)

Heterogeneous Computing With GPGPUs Matthew Piehl Overview Introduction to CUDA Project Overview Issues faced nvcc Implementation Performance Metrics Conclusions.

Add Cool Visualizations Here Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary.

Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.

My Coordinates Office EM G.27 contact time:

Adapting the Visualization Toolkit for Many-Core Processors with the VTK-m Library Christopher Sewell (LANL) and Robert Maynard (Kitware) VTK-m Team: LANL:

S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.

Barthélémy von Haller CERN PH/AID For the ALICE Collaboration The ALICE data quality monitoring system.

GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.

GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.

Accelerating OpenFOAM with GPU Daniel Jasiński (Graphics Processing Units) Atizar.

“SMT Capable CPU-GPU Systems for Big Data”

Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.

Computer Engg, IIT(BHU)

The Present and Future of Parallelism on GPUs

Chapter 4: Threads.

Chapter 15, Exploring the Digital Domain

Chapter 4: Threads.

Compiler Back End Panel

Compiler Back End Panel

Peng Jiang, Linchuan Chen, and Gagan Agrawal

Introduction to CUDA.

Chapter 4: Threads & Concurrency

Graphics Processing Unit

Multicore and GPU Programming

Run time performance for all benchmarked software.

Presentation transcript:

VTK-m Project Goals A single place for the visualization community to collaborate, contribute, and leverage massively threaded algorithms. Reduce the challenges of writing highly concurrent algorithms by using data parallel algorithms

VTK-m Project Goals Make it easier for simulation codes to take advantage these parallel visualization and analysis tasks on a wide range of current and next-generation hardware.

VTK-m Architecture In-Situ Execution Data Parallel Algorithms Arrays Post Processing Worklets DataModel Filters Combines strengths of multiple projects: –EAVL, Oak Ridge National Laboratory –DAX, Sandia National Laboratory –PISTON, Los Alamos National Laboratory

VTK-m Arbitrary Composition VTK-m allows clients to access different memory layouts through the Array Handle and Dynamic Array Handle. – Allows for efficient in-situ integration – Allows for reduced data transfer Control EnvironmentExecution Environment Transfer Control EnvironmentExecution Environment

VTK-m Arbitrary Composition Point Arrangement Cells CoordinatesExplicitLogicalImplicit Structured Strided Separated Unstructured Strided Separated VTK-m Data Set VTK-m allows clients to construct data sets from cell and point arrangements that exactly match their original data – In effect, this allows for hybrid and novel mesh types

functor() Functor Mapping Applied to Topologies [Baker, et al. 2010]

functor() Functor Mapping Applied to Topologies [Baker, et al. 2010]

What We Have So Far Features –Core Types –Statically Typed Arrays –Dynamically Typed Arrays –Device Interface (Serial, Cuda, TBB) –Data Model –Field and Topology Worklet and Dispatcher

What We Have So Far Compiles with –gcc (4.8+), clang, msvc (2010+), icc, and pgi User Guide work in progress Ready for larger collaboration

2 x Intel Xeon CPU E GHz + NVIDIA Tesla K40c Data: 1024^3 (floats)

2 x Intel Xeon CPU E GHz + NVIDIA Tesla K40c Data: 1024^3 (floats)

struct Sine: public vtkm::worklet::WorkletMapField { typedef void ControlSignature(FieldIn<>, FieldOut<>); typedef _2 ExecutionSignature(_1); template VTKM_EXEC_EXPORT T operator()(T x) const { return vtkm::math::Sin(x); } };

struct Sine: public vtkm::worklet::WorkletMapField { typedef void ControlSignature(FieldIn<>, FieldOut<>); typedef _2 ExecutionSignature(_1); template VTKM_EXEC_EXPORT T operator()(T x) const { return vtkm::math::Sin(x); } };

struct Sine: public vtkm::worklet::WorkletMapField { typedef void ControlSignature(FieldIn<>, FieldOut<>); typedef _2 ExecutionSignature(_1); template VTKM_EXEC_EXPORT T operator()(T x) const { return vtkm::math::Sin(x); } };

struct Sine: public vtkm::worklet::WorkletMapField { typedef void ControlSignature(FieldIn<>, FieldOut<>); typedef _2 ExecutionSignature(_1); template VTKM_EXEC_EXPORT T operator()(T x) const { return vtkm::math::Sin(x); } };

struct Sine: public vtkm::worklet::WorkletMapField { typedef void ControlSignature(FieldIn<>, FieldOut<>); typedef _2 ExecutionSignature(_1); template VTKM_EXEC_EXPORT T operator()(T x) const { return vtkm::math::Sin(x); } };

struct Sine: public vtkm::worklet::WorkletMapField { typedef void ControlSignature(FieldIn<>, FieldOut<>); typedef _2 ExecutionSignature(_1); template VTKM_EXEC_EXPORT T operator()(T x) const { return vtkm::math::Sin(x); } };

struct Sine: public vtkm::worklet::WorkletMapField { typedef void ControlSignature(FieldIn<>, FieldOut<>); typedef _2 ExecutionSignature(_1); template VTKM_EXEC_EXPORT T operator()(T x) const { return vtkm::math::Sin(x); } };

struct Sine: public vtkm::worklet::WorkletMapField { typedef void ControlSignature(FieldIn<>, FieldOut<>); typedef _2 ExecutionSignature(_1); template VTKM_EXEC_EXPORT T operator()(T x) const { return vtkm::math::Sin(x); } }; Execution Environment Control Environment vtkm::cont::ArrayHandle inputHandle = vtkm::cont::make_ArrayHandle(input); vtkm::cont::ArrayHandle sineResult; vtkm::worklet::DispatcherMapField dispatcher; dispatcher.Invoke(inputHandle, sineResult);

struct Sine: public vtkm::worklet::WorkletMapField { typedef void ControlSignature(FieldIn<>, FieldOut<>); typedef _2 ExecutionSignature(_1); template VTKM_EXEC_EXPORT T operator()(T x) const { return vtkm::math::Sin(x); } }; vtkm::cont::ArrayHandle inputHandle = vtkm::cont::make_ArrayHandle(input); vtkm::cont::ArrayHandle sineResult; vtkm::worklet::DispatcherMapField dispatcher; dispatcher.Invoke(inputHandle, sineResult); Execution Environment Control Environment

struct Sine: public vtkm::worklet::WorkletMapField { typedef void ControlSignature(FieldIn<>, FieldOut<>); typedef _2 ExecutionSignature(_1); template VTKM_EXEC_EXPORT T operator()(T x) const { return vtkm::math::Sin(x); } }; vtkm::cont::ArrayHandle inputHandle = vtkm::cont::make_ArrayHandle(input); vtkm::cont::ArrayHandle sineResult; vtkm::worklet::DispatcherMapField dispatcher; dispatcher.Invoke(inputHandle, sineResult); Execution Environment Control Environment

struct Sine: public vtkm::worklet::WorkletMapField { typedef void ControlSignature(FieldIn<>, FieldOut<>); typedef _2 ExecutionSignature(_1); template VTKM_EXEC_EXPORT T operator()(T x) const { return vtkm::math::Sin(x); } }; vtkm::cont::ArrayHandle inputHandle = vtkm::cont::make_ArrayHandle(input); vtkm::cont::ArrayHandle sineResult; vtkm::worklet::DispatcherMapField dispatcher; dispatcher.Invoke(inputHandle, sineResult); Execution Environment Control Environment

struct Sine: public vtkm::worklet::WorkletMapField { typedef void ControlSignature(FieldIn<>, FieldOut<>); typedef _2 ExecutionSignature(_1); template VTKM_EXEC_EXPORT T operator()(T x) const { return vtkm::math::Sin(x); } };

Dispatcher Invoke Operations Convert polymorphic types to static types Check types Dispatcher-specific operations –Find domain length –Build index arrays Transport data from control to execution Run worklet invoke kernel Fetch thread-specific data Invoke worklet Push thread-specific data DispatcherMapField dispatcher; dispatcher.Invoke(inputHandle, sineResult);

struct ImagToPolar: public vtkm::worklet::WorkletMapField { typedef void ControlSignature( FieldIn, FieldOut, FieldOut ); typedef void ExecutionSignature(_1, _2, _3, _4); template<typename RealType, typename ImaginaryType, typename MagnitudeType, typename PhaseType> VTKM_EXEC_EXPORT void operator()(RealType real, ImaginaryType imag, MagnitudeType &magnitude, PhaseType &phase) const { magnitude = vtkm::math::Sqrt(real*real + imag*imag); phase = vtkm::math::ATan2(imaginary, real); } };

Coprocessing/In-situ Use of VTK and VTK-m –Process data in place using VTK-m –Visualize and analyze using VTK Bringing highly parallelized visualization and analytics in science to all Create bridges between VTK and VTK-m 26