1 Titanium Review: Immersed Boundary Armando Solar-Lezama Biological Simulations Using the Immersed Boundary Method in Titanium Ed Givelberg, Armando Solar-Lezama,

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Thoughts on Shared Caches Jeff Odom University of Maryland.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Feb 26, 2013, DyanmicParallelism.ppt CUDA Dynamic Parallelism These notes will outline CUDA.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
A Bezier Based Approach to Unstructured Moving Meshes ALADDIN and Sangria Gary Miller David Cardoze Todd Phillips Noel Walkington Mark Olah Miklos Bergou.
A Bezier Based Approach to Unstructured Moving Meshes ALADDIN and Sangria Gary Miller David Cardoze Todd Phillips Noel Walkington Mark Olah Miklos Bergou.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Languages and Compilers for High Performance Computing Kathy Yelick EECS Department U.C. Berkeley.
1 Synthesis of Distributed ArraysAmir Kamil Synthesis of Distributed Arrays in Titanium Amir Kamil U.C. Berkeley May 9, 2006.
Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
Evaluation and Optimization of a Titanium Adaptive Mesh Refinement Amir Kamil Ben Schwarz Jimmy Su.
Simulating the Cochlea With Titanium Generic Immersed Boundary Software (TiGIBS) Contractile Torus: (NYU) Oval Window of Cochlea: (CACR) Mammalian Heart:
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Impact of the Cardiac Heart Flow Alpha Project Kathy Yelick EECS Department U.C. Berkeley.
Programming Systems for a Digital Human Kathy Yelick EECS Department U.C. Berkeley.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium Siu Man Yau, Katherine.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
Use of a High Level Language in High Performance Biomechanics Simulations Katherine Yelick, Armando Solar-Lezama, Jimmy Su, Dan Bonachea, Amir Kamil U.C.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph :
Kathy Yelick, 1 Advanced Software for Biological Simulations Elastic structures in an incompressible fluid. Blood flow, clotting, inner ear, embryo growth,
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
1 Titanium Review: Ti Parallel Benchmarks Kaushik Datta Titanium NAS Parallel Benchmarks Kathy Yelick U.C. Berkeley September.
N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Parallel Simulation of Continuous Systems: A Brief Introduction
Distributed computing using Projective Geometry: Decoding of Error correcting codes Nachiket Gajare, Hrishikesh Sharma and Prof. Sachin Patkar IIT Bombay.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
HELICOIDAL VORTEX MODEL FOR WIND TURBINE AEROELASTIC SIMULATION Jean-Jacques Chattot University of California Davis OUTLINE Challenges in Wind Turbine.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Challenges in Wind Turbine Flows
Cardiac Blood Flow Alpha Project Impact Kathy Yelick EECS Department U.C. Berkeley.
Extreme Computing’05 Parallel Graph Algorithms: Architectural Demands of Pathological Applications Bruce Hendrickson Jonathan Berry Keith Underwood Sandia.
1 Parallel Applications Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Electrical Wave Propagation in a Minimally Realistic Fiber Architecture Model of the Left Ventricle Xianfeng Song, Department of Physics, Indiana University.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Electrical Wave Propagation in a Minimally Realistic Fiber Architecture Model of the Left Ventricle Xianfeng Song, Department of Physics, Indiana University.
Electrical Wave Propagation in a Minimally Realistic Fiber Architecture Model of the Left Ventricle Xianfeng Song, Department of Physics, Indiana University.
WIND TURBINE ENGINEERING ANALYSIS AND DESIGN Jean-Jacques Chattot University of California Davis OUTLINE Challenges in Wind Turbine Flows The Analysis.
Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.
Strong Scalability Analysis and Performance Evaluation of a SAMR CCA-based Reacting Flow Code Sophia Lefantzi, Jaideep Ray and Sameer Shende SAMR: Structured.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
Parallel Programming By J. H. Wang May 2, 2017.
Programming Models for SimMillennium
Supported by the National Science Foundation.
Immersed Boundary Method Simulation in Titanium Objectives
Numerical Investigation of Hydrogen Release from Varying Diameter Exit
Parallel Implementation of Adaptive Spacetime Simulations A
Presentation transcript:

1 Titanium Review: Immersed Boundary Armando Solar-Lezama Biological Simulations Using the Immersed Boundary Method in Titanium Ed Givelberg, Armando Solar-Lezama, Hormozd Gahvari, Meling Ngo, Omair Kamil and Kathy Yelick Computer Science Division, UC Berkeley

2 Titanium Review, Sep. 9, 2004 Project Goals Construction of large scale computational models based on much finer fluid grid than is currently possible Development of distributed algorithms for immersed boundary computations on systems with hundreds of processors Development of a general-purpose software suite to support immersed boundary computations

3 Titanium Review, Sep. 9, 2004 Achievements Since last review Beating Heart Automatic Tissue partitioning 2D fluid partitioning (in progress) Second Order Method (in progress)

4 Titanium Review, Sep. 9, 2004 Immersed Boundary Method A general method for simulating systems containing elastic and possibly active tissue immersed in a viscous, incompressible fluid ANDPeskin AND McQueen – blood flow in the heart platelet aggregation during blood clotting (A. Fogelson) swimming organisms (L. Fauci) flow in collapsible tubes (M. Rozar) valveless pumping (E. Jung) two-dimensional cochlea model (R. Beyer) flapping of a flexible filament in a flowing soap film (L. Zhu)

5 Titanium Review, Sep. 9, 2004 Tissue activation & force calculation Interpolate Velocity Navier-Stokes Solver Spread Force Four Stage Algorithm Immersed Boundary Method

6 Titanium Review, Sep. 9, 2004 Heart Simulation Developed by Peskin and McQueen at NYU On a Cray C90: 1 heart-beat in 100 CPU hours point fluid mesh Heart represented by muscle fibers –Applications: Understanding structural abnormalities Evaluating artificial heart valves Source:

7 Titanium Review, Sep. 9, 2004 Heart Model Parameters Simulation time: Time Step: 24 microseconds, progressively refined down to 6 microseconds Approximately 100,000 time steps per experiment Total simulated time:.9 seconds Memory: <1 Giga-bytes Disk space for output files: Giga-bytes Fluid grid: 128 x 128 x 128 points Fluid grid mesh width: 1.3 millimeter Immersed material (4100 fibers, total ~650,000 points)

8 Titanium Review, Sep. 9, 2004 Beating Heart

9 Titanium Review, Sep. 9, 2004 Performance Challenges Fiber Partition and Distribution Communication reduction Vs Load Balance Fluid Partition Want to use more processors Take advantage of shared memory

10 Titanium Review, Sep. 9, 2004 Fiber Distribution A difficult problem Need a clever way to split fibers What’s good for load distribution may be bad for communication Fibers move around the fluid Expected motility of a tissue point depends on application

11 Titanium Review, Sep. 9, 2004 Clever Fiber Splitting Need a clever way to break fibers Force communication can kill performance Take advantage of linearity of spread All fiber-points in the same processor, no communication problems One point in different processor, need to communicate force across processors Duplicate point across processors, no force to communicate. Linearity keeps points synchronized

12 Titanium Review, Sep. 9, 2004 Formalizing the partitioning problem Key principles: Breaking fibers has a cost We want to maximize spread-box overlap This minimizes communication (x k1, y k1, z k1 ) (x k2, y k2, z k2 ) (x p2, y p2, z p2 ) (x p1, y p1, z p1 ) Amount overlap:= (4 - |x k1 – x k2 |) x (4 - |y k1 – y k2 |) x (4 - |z k1 – z k2 |)

13 Titanium Review, Sep. 9, 2004 Formalizing the partitioning problem We can encode this information in a graph The cost of an edge is determined as = Overlap(v1,v2) + Penalty(v1,v2) Penalty(v1,v2) is zero if v1 and v2 are not on the same fiber. Penalty(v1, v2) is a parameter that can be tuned based on how much fibers are expected to move The graph is too big it must be pruned Graph can be partitioned with Metis

14 Titanium Review, Sep. 9, 2004 Results of Fiber Splitting

15 Titanium Review, Sep. 9, 2004 Results of Fiber Splitting

16 Titanium Review, Sep. 9, 2004 Results of Fiber Splitting

17 Titanium Review, Sep. 9, 2004 Results of Fiber Splitting

18 Titanium Review, Sep. 9, 2004 Results of Fiber Splitting 15% performance increase over already tuned fiber distribution. Very general method Can be tuned for other applications with different tissue properties

19 Titanium Review, Sep. 9, 2004 Fluid Partitioning Current heart code partitions the fluid in only one dimension Partitioning in two dimensions increases the potential scalability Problem grows O(N^3), but with slab partitioning, processors can only grow O(N) Partitioning in more than 1D can greatly increase communication

20 Titanium Review, Sep. 9, 2004 SMP aware partitioning Give slabs to nodes rather than processors Processors in a node cooperate to compute on the slab We are still performance tuning this approach Need to address some false sharing issues

21 Titanium Review, Sep. 9, 2004 SMP aware partitioning Scalability of FFT

22 Titanium Review, Sep. 9, 2004 SMP aware partitioning Scalability of FFT

23 Titanium Review, Sep. 9, 2004 Second-Order Solver Implementation of Peskin and McQueen’s formally second-order accurate Immersed Boundary Method First ever implementation of this method in a distributed memory parallel environment This effort is currently in progress

24 Titanium Review, Sep. 9, 2004 Why a Second-Order Solver? Some problems require it when you use a finer grid to model the problem Captures information that you can’t see with the first-order method Will allow for a larger timestep in the heart

25 Titanium Review, Sep. 9, 2004 Current Implementation Status Version without sources and sinks has yet to be tested Version with sources-and-sinks breaks down on test case (pipe with one source and one sink) after 10 timesteps Reason: some extra math for the flow from the source needs to be taken into account

26 Titanium Review, Sep. 9, 2004 Future Work Continue work on Second Order Method Continue work on fluid partitioning Show feasibility of simulating a 256 x 256 x 256 heart