Sharks and Fishes – The problem

Slides:



Advertisements
Similar presentations
Parallel List Ranking Advanced Algorithms & Data Structures Lecture Theme 17 Prof. Dr. Th. Ottmann Summer Semester 2006.
Advertisements

Graph Algorithms Carl Tropper Department of Computer Science McGill University.
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Practical techniques & Examples
Lecture 12 Reduce Miss Penalty and Hit Time
1 Parallel Parentheses Matching Plus Some Applications.
Cellular Automata: Life with Simple Rules
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Game of Life Courtesy: Dr. David Walker, Cardiff University.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Definitions A Synchronous application is one where all processes must reach certain points before execution continues. Local synchronization is a requirement.
Introduction to Parallel Processing Final Project SHARKS & FISH Presented by: Idan Hammer Elad Wallach Elad Wallach.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
CS 584. Review n Systems of equations and finite element methods are related.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
1 Synthesis of Distributed ArraysAmir Kamil Synthesis of Distributed Arrays in Titanium Amir Kamil U.C. Berkeley May 9, 2006.
StreamMD Molecular Dynamics Eric Darve. MD of water molecules Cutoff is used to truncate electrostatic potential Gridding technique: water molecules are.
Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Multiscalar processors
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.
Fall 2008Paradigms for Parallel Algorithms1 Paradigms for Parallel Algorithms.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Parallel Programming: Case Studies Todd C. Mowry CS 495 September 12, 2002.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
1 High-Performance Grid Computing and Research Networking Presented by Yuming Zhang Instructor: S. Masoud Sadjadi
Processing Arrays Lesson 8 McManusCOP Overview One-Dimensional Arrays –Entering Data into an Array –Printing an Array –Accumulating the elements.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Synchronous Computations ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Jan 23, 2013 In a (fully) synchronous computation, all the processes.
Lukasz Grzegorz Maciak Micheal Alexis
C OMPUTER S CIENCE G EMS 5 Challenging Projects. P ROJECT 1: M ASTERMIND Summary: A code breaking game 12 guesses Feedback is # correct and # correct.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Molecular Dynamics Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
SEARCHING.  This is a technique for searching a particular element in sequential manner until the desired element is found.  If an element is found.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Communication and Computation on Arrays with Reconfigurable Optical Buses Yi Pan, Ph.D. IEEE Computer Society Distinguished Visitors Program Speaker Department.
Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Cellular Automata Introduction  Cellular Automata originally devised in the late 1940s by Stan Ulam (a mathematician) and John von Neumann.  Originally.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Adaptive Mesh Applications Sathish Vadhiyar Sources: - Schloegel, Karypis, Kumar. Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes. JPDC.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures - CSCI 102 Selection Sort Keep the list separated into sorted and unsorted sections Start by finding the minimum & put it at the front.
Introductory Parallel Applications Courtesy: Dr. David Walker, Cardiff University.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,
Courtesy: Dr. David Walker, Cardiff University
Parallel Shared Memory
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
Parallel Programming in C with MPI and OpenMP
Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Notes on Assignment 3 OpenMP Stencil Pattern
Management From the memory view, we can list four important tasks that the OS is responsible for ; To know the used and unused memory partitions To allocate.
Adaptivity and Dynamic Load Balancing
Introduction to High Performance Computing Lecture 16
Parallel Programming in C with MPI and OpenMP
Lecture-Hashing.
Presentation transcript:

Sharks and Fishes – The problem Ocean divided into 2D-grid Each grid cell can be empty or have a fish or a shark Grid initially populated with fishes and sharks in a random manner Population evolves over discrete time steps according to certain rules

Sharks and Fishes - Rules At each time step, a fish tries to move to a neighboring empty cell. If not empty, it stays If a fish reaches a breeding age, when it moves, it breeds, leaving behind a fish of age 0. Fish cannot breed if it doesn’t move. Fish never starves

Sharks and Fishes - Rules At each time step, if one of the neighboring cells has a fish, the shark moves to that cell eating the fish. If not and if one of the neighboring cells is empty, the shark moves there. Otherwise, it stays. If a shark reaches a breeding age, when it moves, it breeds, leaving behind a shark of age 0. shark cannot breed if it doesn’t move. Sharks eat only fish. If a shark reaches a starvation age (time steps since last eaten), it dies.

Inputs and Data Structures Size of the grid Distribution of sharks and fishes Shark and fish breeding ages Shark starvation age Data structures: A 2-D grid of cells struct ocean{ int type /* shark or fish or empty */ struct swimmer* occupier; }ocean[MAXX][MAXY] A linked list of swimmers struct swimmer{ int type; int x,y; int age; int last_ate; int iteration; swimmer* prev; swimmer* next; } *List; Sequential Code Logic Initialize ocean array and swimmers list In each time step, go through the swimmers in the order in which they are stored and perform updates

Towards a Parallel Code 2-D data distribution similar to Laplace and molecular dynamics is used. Each processor holds a grid of ocean cells. For communication, each processor needs data from 4 neighboring processors. 2 challenges – potential for conflicts, load balancing

1st Challenge – Potential for Conflicts Border cells may change during updates due to fish or shark movement Border cells need to be communicated back to the original processor. Hence update step involves communication In the meantime, the original processor may have updated the border cell. Hence potential conflicts Time T S F S F F F F S F S F S F F F F F F F F Time T+1

2 Techniques Rollback updates for those particles (fish or shark) that have crossed processor boundary and are in potential conflicts. May lead to several rollbacks until a free space is found. 2nd technique is synchronization during updates to avoid conflicts in the first place.

2 Techniques During update, a processor x sends its data first to processor y, allows y to perform its updates, get the updates from y, and then performs its own updates. Synchronization can be done by sub-partitioning. Divide a grid owned by a processor into sub-grids. 1 2 1 2 3 4 3 4 update

Load Imbalance The workload distribution changes over time 2-D block distribution is not optimal Techniques: Dynamic load balancer Statistical load balancing by a different data distribution

Dynamic load balancing Performed at each time step Orthogonal recursive bisection Problems: Complexity in finding the neighbors and data for communication

Statistical Data Distribution Cyclic P0 P1 P2 Increase in boundary data; increase in communication Problems: