Parallel Computing Majid Almeshari John Conklin. Outline The Challenge Available Parallelization Resources Status of Parallelization Plan & Next Step.

Slides:

Advertisements

Similar presentations

MPI Message Passing Interface

Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing

Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.

Chess Problem Solver Solves a given chess position for checkmate Problem input in text format.

Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.

Real-Time Video Analysis on an Embedded Smart Camera for Trafﬁc Surveillance Presenter: Yu-Wei Fan.

Progress in Two-second Filter Development Michael Heifetz, John Conklin.

1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.

Outline Introduction Image Registration High Performance Computing Desired Testing Methodology Reviewed Registration Methods Preliminary Results Future.

Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

Introduction to Systems Architecture Kieran Mathieson.

Multiprocessors ELEC 6200 Computer Architecture and Design Instructor: Dr. Agrawal Yu-Chun Chen 10/27/06.

Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.

Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.

CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL:

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.

High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.

1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

CS 470/570:Introduction to Parallel and Distributed Computing.

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

Advances in Language Design

What is Concurrent Programming? Maram Bani Younes.

A Secure Protocol for Spontaneous Wireless Ad Hoc Networks Creation.

UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Day 4 Understanding Hardware Partitions Linux Boot Sequence.

Artdaq Introduction artdaq is a toolkit for creating the event building and filtering portions of a DAQ. A set of ready-to-use components along with hooks.

1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.

Improving Network I/O Virtualization for Cloud Computing.

A Transport Framework for Distributed Brokering Systems Shrideep Pallickara, Geoffrey Fox, John Yin, Gurhan Gunduz, Hongbin Liu, Ahmet Uyar, Mustafa Varank.

GBT Interface Card for a Linux Computer Carson Teale 1.

m-Privacy for Collaborative Data Publishing

Operating Systems. Without an operating system your computer would be useless! A computer contains an Operating System on its Hard Drive. This is loaded.

Optimal Client-Server Assignment for Internet Distributed Systems.

Cluster Workstations. Recently the distinction between parallel and distributed computers has become blurred with the advent of the network of workstations.

Processes and OS basics. RHS – SOC 2 OS Basics An Operating System (OS) is essentially an abstraction of a computer As a user or programmer, I do not.

المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.

A performance evaluation approach openModeller: A Framework for species distribution Modelling.

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

ESC499 – A TMD-MPI/MPE B ASED H ETEROGENEOUS V IDEO S YSTEM Tony Zhou, Prof. Paul Chow April 6 th, 2010.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Content Sharing over Smartphone-Based Delay- Tolerant Networks.

Privacy Preserving Delegated Access Control in Public Clouds.

A Method for Mining Infrequent Causal Associations and Its Application in Finding Adverse Drug Reaction Signal Pairs.

7. CBM collaboration meetingXDAQ evaluation - J.Adamczewski1.

Parallelizing Video Transcoding Using Map-Reduce-Based Cloud Computing Speaker : 童耀民 MA1G0222 Feng Lao, Xinggong Zhang and Zongming Guo Institute of Computer.

Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,

Distributed simulation with MPI in ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011.

ICC Module 3 Lesson 1 – Computer Architecture 1 / 12 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 6 – Logic parallelism.

m-Privacy for Collaborative Data Publishing

Distributed Systems 2 Distributed Processing. Process A process is a logical representation of a physical processor that executes program code and has.

OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.

Harnessing the Cloud for Securely Outsourcing Large- Scale Systems of Linear Equations.

1 Parallel Applications Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002.

SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.

1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.

Load Rebalancing for Distributed File Systems in Clouds.

Background Computer System Architectures Computer System Software.

1 of 14 Lab 2: Design-Space Exploration with MPARM.

ENERGY EFFICIENT TIME SYNCHRONIZATION PROTOCOL FOR MOBILE UNDERWATER ACOUSTIC SENSOR NETWORKS Under the Guidance of Submitted by Mr. P. Mukunthan, AP/CSE.

CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.

Architecture Background

Wavelet “Block-Processing” for Reduced Memory Transfers

CS246 Search Engine Scale.

CS246: Search-Engine Scale

Mapping DSP algorithms to a general purpose out-of-order processor

Presentation transcript:

Parallel Computing Majid Almeshari John Conklin

Outline The Challenge Available Parallelization Resources Status of Parallelization Plan & Next Step

The Challenge 1. Sigma Point Filtering is used as our non-linear estimation filter Carrying this out on a 2-second time step is computationally intensive and requires handling a huge set of data 2. The data set is huge ~ 1 Terabyte Resources such as RAM may not be sufficient to carry out the computation tasks efficiently. Swapping to the disk delays the processing as well In a nutshell, A computationally intensive algorithm is applied on a huge set of data Testing takes a long time which delays the verification of the correctness of the algorithms

Available Parallelization Resources Hardware –2 Computer Clusters (Namely: Regelation and Nivation) »Regelation : bit-CPUs 64-bit enables us to address a memory space beyond 4 GB »Nivation: bit-CPUs »Both clusters run linux Software –Matlab –MatlabMPI »A toolbox for parallel computing using Matlab –Other parallel computing toolboxes have been investigated as alternatives when MatlabMPI proves to be inefficient

Status - 1 We managed to run a first test using MatlabMPI on the cluster –Master-Slave architecture –Single-Instruction Multiple-Data (SIMD) parallel processing style Results –Pure computation time was improved –However, Inter-Processor communication overhead took about 90% of the processing time. This is due to Large message size, 5 MB in average Total number of messages can go up to 16,000 (~4000 orbit X 4 gyros) That is ~80 GB of reading/writing to the disk + network

Status - 2 This required a re-arrangement of the processing procedure to: – Minimize communication time – Distribute the work non-uniformly among processors At the same time, we are looking deeper inside the code to improve efficiency (computation, RAM). Focus is on: – Modules that take most of the computation time (e.g. building the Jacobian, 80% of the time) – Modules that consume most of the RAM

Plan We are working closely with the development team to understand the code better (until the end of June) –Further understanding of Sigma-Point Kalman Filter will help us parallelize it better –Knowing the dependencies among modules will help us minimize inter- processor communication time We are running small tests whenever the code changes to see and record any improvement (until the end of June) On the long run, We plan to: – Design a parallel framework where we assign which modules to run where (by the end of July) – Implement and test this framework incrementally (Complete by the end of September)