Simple but slow: O(n 2 ) algorithms Serial Algorithm Algorithm compares each particle with every other particle and checks for the interaction radius Most.

Slides:



Advertisements
Similar presentations
Refining High Performance FORTRAN Code from Programming Model Dependencies Ferosh Jacob University of Alabama Department of Computer Science
Advertisements

CS4432: Database Systems II
N-Body I CS 170: Computing for the Sciences and Mathematics.
Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.
CSCE 3110 Data Structures & Algorithm Analysis
David Streader Computer Science Victoria University of Wellington Copyright: David Streader, Victoria University of Wellington Recursion COMP T1.
© Copyright 2012 by Pearson Education, Inc. All Rights Reserved. 1 Chapter 17 Sorting.
Geographic Gossip: Efficient Aggregations for Sensor Networks Author: Alex Dimakis, Anand Sarwate, Martin Wainwright University: UC Berkeley Venue: IPSN.
Analysis Meeting 31Oct 05T. Burnett1 Classification, etc. status at UW Implement “nested trees” Generate “boosted” trees for all of the Atwood categories.
Low Density Parity Check Codes LDPC ( Low Density Parity Check ) codes are a class of linear bock code. The term “Low Density” refers to the characteristic.
Parallel Processing – Final Project Performed by:Nitsan Mane Jonathan Distler PP9.
Complexity (Running Time)
Dovetail Killer? Implementing Jonathan’s New Idea Tarek Sherif.
Analysis of Algorithms 7/2/2015CS202 - Fundamentals of Computer Science II1.
1 A Coverage-Preserving & Hole Tolerant Based Scheme for the Irregular Sensing Range in WSN Azzedine Boukerche, Xin Fei, Regina B. Araujo PARADISE Research.
Monte Carlo Methods in Partial Differential Equations.
Parallelization: Conway’s Game of Life. Cellular automata: Important for science Biology – Mapping brain tumor growth Ecology – Interactions of species.
Gauss’ Law. Class Objectives Introduce the idea of the Gauss’ law as another method to calculate the electric field. Understand that the previous method.
Spring2012 Lecture#10 CSE 246 Data Structures and Algorithms.
MA/CS471 Lecture 8 Fall 2003 Prof. Tim Warburton
Securing Every Bit: Authenticated Broadcast in Wireless Networks Dan Alistarh, Seth Gilbert, Rachid Guerraoui, Zarko Milosevic, and Calvin Newport.
CS 1704 Introduction to Data Structures and Software Engineering.
Parallelization: Area Under a Curve. AUC: An important task in science Neuroscience – Endocrine levels in the body over time Economics – Discounting:
VIS Group, University of Stuttgart Tutorial T4: Programmable Graphics Hardware for Interactive Visualization Adaptive Terrain Slicing (Stefan Roettger)
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Chapter 13 B Advanced Implementations of Tables – Balanced BSTs.
Digital Image Processing CCS331 Relationships of Pixel 1.
P ARALLELIZATION IN M OLECULAR D YNAMICS By Aditya Mittal For CME346A by Professor Eric Darve Stanford University.
George F Luger ARTIFICIAL INTELLIGENCE 5th edition Structures and Strategies for Complex Problem Solving Machine Learning: Social and Emergent Luger: Artificial.
Optimising Cuts for HLT George Talbot Supervisor: Stewart Martin-Haugh.
CSC 211 Data Structures Lecture 13
Simplified Smoothed Particle Hydrodynamics for Interactive Applications Zakiya Tamimi Richard McDaniel Based on work done at Siemens Corporate.
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
Getting Off the Earth Focus 2 Part 2.
Hank Childs, University of Oregon Isosurfacing (Part 3)
A Heap Implementation Chapter 26 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Comparison of Cell and POWER5 Architectures for a Flocking Algorithm A Performance and Usability Study CS267 Final Project Jonathan Ellithorpe Mark Howison.
C++ How to Program, 7/e © by Pearson Education, Inc. All Rights Reserved.
Tue. Jan. 27 – Physics Lecture #23 Electric Field, Continued (and Continuous!) 1. Electric Field due to Continuous Charge Distributions Warm-up: Consider.
Search Algorithms Written by J.J. Shepherd. Sequential Search Examines each element one at a time until the item searched for is found or not found Simplest.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
D. M. Ceperley, 2000 Simulations1 Neighbor Tables Long Range Potentials A & T pgs , Today we will learn how we can handle long range potentials.
Algorithms Lecture #05 Uzair Ishtiaq. Asymptotic Notation.
Dave Lattanzi’s Laplace Planner. Laplace Solution Store node charges as a dictionary – Originally used state variables and nodeList Find Laplace solution.
Constant Electromagnetic Field Section 19. Constant fields E and H are independent of time t.  and A can be chosen time independent, too.
Gauss’ Law Chapter 23. Electric field vectors and field lines pierce an imaginary, spherical Gaussian surface that encloses a particle with charge +Q.
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Pitfalls: Time Dependent Behaviors CS433 Spring 2001 Laxmikant Kale.
Recursion COMP T1 #6.
Analysis of Algorithms
Tree based validation tool for track reconstruction
Analysis of Algorithms
Parallel Density-based Hybrid Clustering
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
Using Flow Textures to Visualize Unsteady Vector Fields
CS223 Advanced Data Structures and Algorithms
Optimizing Malloc and Free
Wednesday, April 18, 2018 Announcements… For Today…
Acceleration Changing velocity (non-uniform) means an acceleration is present Acceleration is the rate of change of the velocity Units are m/s² (SI)
Non-parametric Filters
Analysis of Algorithms
Non-parametric Filters
Non-parametric Filters
Partition Refinement 更多精品PPT模板下载: 1
CS223 Advanced Data Structures and Algorithms
Lists CSE 373 Data Structures.
Chapter 21, Electric Charge, and electric Field
Analysis of Algorithms
Image Enhancement in Spatial Domain: Point Processing
Presentation transcript:

Simple but slow: O(n 2 ) algorithms Serial Algorithm Algorithm compares each particle with every other particle and checks for the interaction radius Most comparisons are outside interaction range as shown in figure OpenMP, Pthreads Algorithm simply splits the particles into p equal sized subsets and accesses all non-owned particles in the interaction check MPI Algorithm Code gathers all particles on local node and then compares with local particles. Big Idea to go faster Before implementing parallelism work out best serial algorithm Legend Current particle Actual neighbor Checked particle Non-Checked particle Interaction Radius

Faster O(n) serial algorithm – “Binning” Main idea Since all far-field interactions are ignored creating a “local neighborhood” through binning can alleviate most of the unnecessary checks since all grey particles can be ignored Time complexity Checking only neighboring “bins“ reduces the checks from O(n) for each particle to O(9d) where d is the average number of particles in each cell (assume bin size >= cutoff) Since it is said within the statement of the problem that density is uniform and we can see in the common files that the domain size increases to maintain a constant density we can consider the d number to be a small constant (In practice d<3-4 for bin size ~= cutoff) Overall complexity becomes O(9d n) = O(n) Legend Current particle Actual neighbor Checked particle Non-Checked particle Interaction Radius Local Bin

O(n) serial algorithm – “Binning” Implementation details Particles need to be assigned to bins at every timestep which presents at least two different options: 1) deleting list and rebinning every timestep (depending on how bins are implemented potentially time-consuming) 2) maintaining bins and moving particles. (Can create a lot of overhead in checking for new particles being added to your bin – particles may “jump” past neighbor bin) Common problems If particles seem to be accelerating most likely scenario is that interactions are not happening when they should be In case of 2 nd implementation of binning remember to check for total particle count and ensure no particles are lost in moving Legend Current particle Actual neighbor Checked particle Non-Checked particle Interaction Radius Local Bin

O(n) Shared memory Implementation Race conditions While the algorithm doesn’t change much from the serial the biggest challenge is avoiding excessive synchronization between threads while ensuring that all threads are on the same step of the algorithm (ex: rebinning, calculating forces, moving particles, etc. ) or at least that they are not too far ahead to risk correctness Common problems Dead lock between competing threads for a particular set of bins if locking not implemented carefully (avoid acquiring multiple locks) Inaccurate results from race conditions for updating particle acceleration (harder to spot) Slower performance than serial because of excessive synchronization Legend Current particle Actual neighbor Checked particle Non-Checked particle Interaction Radius Local Bin

O(n) MPI Implementation Implementation options Can implement analogously to shared memory implementation with messages instead of shared variables Second implementation splits particles based on location onto processors in addition to bins and must implement proper particle movement between processors Common problems Takes much longer to code correctly Deadlock problems can occur if implemented with blocking send/receive pairs Particles not interacting with neighboring bins from other processors ( both N,S,E,W and diagonally) Particles disappearing at processor borders Must deal with ghostzones Legend Current particle Actual neighbor Checked particle Non-Checked particle Interaction Radius Local Bin Processor Boundary