All-Pairs Shortest Paths

Slides:

Advertisements

Similar presentations

Algorithm Design Methodologies Divide & Conquer Dynamic Programming Backtracking.

Advertisements

* Bellman-Ford: single-source shortest distance * O(VE) for graphs with negative edges * Detects negative weight cycles * Floyd-Warshall: All pairs shortest.

Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.

A Parallel GPU Version of the Traveling Salesman Problem Molly A. O’Neil, Dan Tamir, and Martin Burtscher* Department of Computer Science.

Why GPUs? Robert Strzodka. 2Overview Computation / Bandwidth / Power CPU – GPU Comparison GPU Characteristics.

Instructor Notes This lecture discusses three important optimizations The performance impact of mapping threads to data on the GPU is subtle but extremely.

Sparse LU Factorization for Parallel Circuit Simulation on GPU Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Department of Electronic Engineering,

Parallelizing Audio Feature Extraction Using an Automatically-Partitioned Streaming Dataflow Language Eric Battenberg Mark Murphy CS 267, Spring 2008.

PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.

2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.

High-Performance Computation for Path Problems in Graphs

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

All-Pairs-Shortest-Paths for Large Graphs on the GPU Gary J Katz 1,2, Joe Kider 1 1 University of Pennsylvania 2 Lockheed Martin IS&GS.

Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.

Rahul Sharma (Stanford) Michael Bauer (NVIDIA Research) Alex Aiken (Stanford) Verification of Producer-Consumer Synchronization in GPU Programs June 15,

1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara Grand Challenges in Data-Intensive Discovery.

2012/06/22 Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use.

GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.

COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.

Enhancing GPU for Scientific Computing Some thoughts.

Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

Extracted directly from:

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

GPU Programming with CUDA – Optimisation Mike Griffiths

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.

NVIDIA Tesla GPU Zhuting Xue EE126. GPU Graphics Processing Unit The "brain" of graphics, which determines the quality of performance of the graphics.

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

GPU Architecture and Programming

A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.

Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.

Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.

Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.

GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.

Optimizing Parallel Reduction in CUDA Mark Harris NVIDIA Developer Technology.

1 Ch20. Dynamic Programming. 2 BIRD’S-EYE VIEW Dynamic programming The most difficult one of the five design methods Has its foundation in the principle.

Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.

University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.

1 GPU programming Dr. Bernhard Kainz. 2 Dr Bernhard Kainz Overview About myself Motivation GPU hardware and system architecture GPU programming languages.

SPANNING TREES, INTRO. TO THREADS Lecture 23 CS2110 – Fall

GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.

Blocked 2D Convolution Ravi Sankar P Nair

Parallel Programming Models

CS203 – Advanced Computer Architecture

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Two-Dimensional Phase Unwrapping On FPGAs And GPUs

Shortest Paths.

Employing compression solutions under openacc

CS427 Multicore Architecture and Parallel Computing

Parallel Computing Lecture

Accelerating MapReduce on a Coupled CPU-GPU Architecture

Lecture 5: GPU Compute Architecture

Speedup over Ji et al.'s work

Shortest Path Problems

Presented by: Isaac Martin

Lecture 5: GPU Compute Architecture for the last time

Lecture 7 All-Pairs Shortest Paths

Shortest Paths.

CS200: Algorithm Analysis

Single Source Shortest Paths Bellman-Ford Algorithm

Chapter 9: Graphs Basic Concepts

Shortest Path Problems

Shortest Paths.

6- General Purpose GPU Programming

Chapter 9: Graphs Basic Concepts

All Pairs Shortest Path Examples While the illustrations which follow only show solutions from vertex A (or 1) for simplicity, students should note that.

Force Directed Placement: GPU Implementation

Presentation transcript:

All-Pairs Shortest Paths Directed graph with “costs” on edges Find least-cost paths between all reachable vertex pairs Several classical algorithms with Work ~ matrix multiplication Span ~ log2 n Case study of implementation on multicore architecture: graphics processing unit (GPU)

GPU characteristics But: Powerful: two Nvidia 8800s > 1 TFLOPS Inexpensive: $500 each But: Difficult programming model: One instruction stream drives 8 arithmetic units Performance is counterintuitive and fragile: Memory access pattern has subtle effects on cost Extremely easy to underutilize the device: Doing it wrong easily costs 100x in time

Recursive All-Pairs Shortest Paths B D C Based on R-Kleene algorithm Well suited for GPU architecture: Fast matrix-multiply kernel In-place computation => low memory bandwidth Few, large MatMul calls => low GPU dispatch overhead Recursion stack on host CPU, not on multicore GPU Careful tuning of GPU code A B C D + is “min”, × is “add” A = A*; % recursive call B = AB; C = CA; D = D + CB; D = D*; % recursive call B = BD; C = DC; A = A + BC;

Execution of Recursive APSP

APSP: Experiments and observations 128-core Nvidia 8800 Speedup relative to. . . 1-core CPU: 120x – 480x 16-core CPU: 17x – 45x Iterative, 128-core GPU: 40x – 680x MSSSP, 128-core GPU: ~3x Time vs. Matrix Dimension Conclusions: High performance is achievable but not simple Carefully chosen and optimized primitives will be key