Outline Introduction Image Registration High Performance Computing Desired Testing Methodology Reviewed Registration Methods Preliminary Results Future.

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
Implementing Parallel Graph Algorithms Spring 2015 Implementing Parallel Graph Algorithms Lecture 1: Introduction Roman Manevich Ben-Gurion University.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
Parallel Computing Majid Almeshari John Conklin. Outline The Challenge Available Parallelization Resources Status of Parallelization Plan & Next Step.
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
Information Technology Center Introduction to High Performance Computing at KFUPM.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
Summary Background –Why do we need parallel processing? Applications Introduction in algorithms and applications –Methodology to develop efficient parallel.
Reference: Message Passing Fundamentals.
Yuan CMSC 838 Presentation Parallelisation of IBD computation for determining genetic disease map.
Beowulf Cluster Computing Each Computer in the cluster is equipped with: – Intel Core 2 Duo 6400 Processor(Master: Core 2 Duo 6700) – 2 Gigabytes of DDR.
Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
FLANN Fast Library for Approximate Nearest Neighbors
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
Parallelism and Robotics: The Perfect Marriage By R.Theron,F.J.Blanco,B.Curto,V.Moreno and F.J.Garcia University of Salamanca,Spain Rejitha Anand CMPS.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Artdaq Introduction artdaq is a toolkit for creating the event building and filtering portions of a DAQ. A set of ready-to-use components along with hooks.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
MapReduce How to painlessly process terabytes of data.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.
Operating System Principles And Multitasking
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
A Software Framework for Distributed Services Michael M. McKerns and Michael A.G. Aivazis California Institute of Technology, Pasadena, CA Introduction.
Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.
Computational Research in the Battelle Center for Mathmatical medicine.
1 Distributed BDD-based Model Checking Orna Grumberg Technion, Israel Joint work with Tamir Heyman, Nili Ifergan, and Assaf Schuster CAV00, FMCAD00, CAV01,
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing Rahul.S. Sampath May 9 th 2007.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
1 Munther Abualkibash University of Bridgeport, CT.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
NFV Compute Acceleration APIs and Evaluation
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Hiba Tariq School of Engineering
Constructing a system with multiple computers or processors
Genomic Data Clustering on FPGAs for Compression
Compiler Back End Panel
Compiler Back End Panel
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Introduction to Operating Systems
Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz
Introduction to Operating Systems
Types of Parallel Computers
Presentation transcript:

Outline Introduction Image Registration High Performance Computing Desired Testing Methodology Reviewed Registration Methods Preliminary Results Future Work Cool App Demo

Introduction Primary Motivation After some research, the scope of this project increased tenfold

Image Registration Image Registration is the process of determining a spatial transformation that establishes the correspondence of two images

Image Registration Applications of Image Registration – Cartography – Computer Vision – Image Guided Surgery – Brain Mapping – Detection of Disease state change over time – And many more…

Image Registration Software packages, libraries, and frameworks capable of Image Registration – Automated Image Registration Package (AIR) – Insight Segmentation and Registration Toolkit (ITK) – FLexible Image Registration Toolkit (FLIRT) – Mathworks Image Processing Toolkit – Others… None currently support registration by means of parallel computing!

Image Registration Depending on the application, registration can be highly demanding of resources – Large amounts of data to be worked on can be too large for physical memory (results in disk swapping) – Search spaces (deformable problems can get as large as say 9.8 * 10^6)

High Performance Computing Extremely efficient in reducing performance and memory issues Steadily decreasing prices and a high increase availability of high performance machines has made parallel computing for many a reality Most image registration specialists are not familiar with parallel and distributed computing techniques Many researchers have successfully applied such methods, but none have a created a generic software module

High Performance Computing My Role – Administer and maintain the two clusters Nick and Optimus – Head of the USC High Performance Computing Group – Assist users – Developed and (try to) maintain the HPCG Webpage

High Performance Computing Systems: Nick HARDWARE: 76 Compute Nodes: Dual 3.4 Xeon 2ML2, 4GB RAM, 1-40GB 1 Master Node: Dual 3.2 GHz Xeon 2ML2, 4GB RAM, 3- 73GB disks RAID 5 INTERCONNECT : Topspin Infiniband SOFTWARE: Platform Rocks 4 (RHEL 4), Platform LSF, OpenMPI (Compiled with Infiniband Libraries), 64bit GCC compiles, Intel Compilers, Star-CD, ITK, others… Will support starting Summer: GAMESS, NWCHEM, …

High Performance Computing Systems: Optimus HARDWARE: 64 Compute Nodes: Dual, Dual-core 2.2 GHz Opteron 2ML2, 8GB RAM, 1-250GB 1 Master Node: Dual, Dual-core 2.2 GHz Xeon 2ML2, 8GB RAM, 2-500GB disks INTERCONNECT : GigE SOFTWARE: Fedora Core 4, ABC Management Software, OpenPBS scheduling software. OpenMPI (Compiled with Infiniban Libraries), 64bit GCC compiles, Intel Compilers, ITK, others… Will support starting Summer: GAMESS, NWCHEM, …

High Performance Computing Message Passing – In distributed memory systems, the most prevalent means of communication is message passing – Message Passing Interface (MPI) Takes care of low-level details such as buffering, error handling, and data-type conversion Middleware component in conjunction with standard programming language like C, C++, and Fortran

High Performance Computing Issues with Multi-core [6] – Memory Contention – Interconnect Contention – Program Locality "--mca mpi_paffinity_alone 1"

Desired Testing Methodology Research and analyze existing registration frameworks to determine if their workload can be distributed in a parallel environment Thoroughly test all methods sequentially and in parallel to determine Speedup Testing in 2-D and 3-D, intermodal and intramodal, and rigid and non-rigid image registration Focus on Intensity based methods Address known multi-core issues

Desired Testing Methodology Two strategies – Parallelizing the optimization method – Parallelizing the metric function

Desired Testing Methodology The measure of quality will be defined using Parallel Speedup and Parallel Efficiency Parallel speed up is defined as S N = T S /T N where T S is the execution time of the best sequential algorithm, and T N is the execution time on N processors Parallel efficiency is defined as E N = S N /N where N is the number of processors

Reviewed Registration Methods Warfield’s Approach [3] Cachier's demons algorithm [5] as used in [7] – Claims it’s precise, robust, relatively low computation time – Structure makes it a good candidate for parallelization – Can be divided into three main “bricks”: Oversampling needed by the pyramidal approach Search for the matches Parallel gaussian filtering

Reviewed Registration Methods Cachier's demons algorithm [5] as used in [6]

Reviewed Registration Methods Acceleration of Genetic Algorithm with Parallel Processing with Application in Medical Image Registration (B. Laksanapanai* W. Withayachumnankul * C. Pintavirooj * P.Tosranon*) Very intriguing, but such a short paper and didn’t really dive into how it was implemented

Reviewed Registration Methods Distributed Registration Framework as proposed by Michael Kuhn [1] The metric calculation is organized in a master/slave design. The master process is responsible for data distribution as well as communication of the existing framework Each slave is assigned a region of the fixed image, and calculates an intermediate metric value Master node coordinates all steps required to collect and process the partial results and passes the final result to the registration framework

Reviewed Registration Methods

Implemented these concepts through: – DistributedImageToImageMetric – RegistrationCommunicator DistributedImageToImageMetric class is divided into master and slave, and is derived from itk::ImageToImageMetric class RegistrationCommuncator provides an interface for all communication tasks and uses MPI

Reviewed Registration Methods Whole registration process consists of two stages: Initialization and Optimization – Initialization: distribute data to nodes – Optimization: optimizers in ITK work iteration based During each iteration, metric values and derivatives are requested from metric function When new values are required, optimizer requests a metric from the master, master then asks slaves to compute the partial value associated with their fixed region and transmits back to master Master processes and repeats until complete

Preliminary Results Sequential Runs: MeanSquaresImagetoImageMetric

Preliminary Results Sequential Runs: MeanSquaresImagetoImageMetric NickOptimus Best Run Time427.7 s522.3 s

Future Work Implement an attachable parallel image registration framework (that supports Multi- core as well) to existing tools such as ITK Thorough Testing on both clusters The usage of multiple cores in one node requires a new programming model Forms of Data Decomposition

Questions?

Photosynth Demo