L21: “Irregular” Graph Algorithms November 11, 2010.

Slides:



Advertisements
Similar presentations
Lecture 15. Graph Algorithms
Advertisements

Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Graph Algorithms Carl Tropper Department of Computer Science McGill University.
Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.
CS 206 Introduction to Computer Science II 03 / 27 / 2009 Instructor: Michael Eckmann.
Graphs Chapter 12. Chapter Objectives  To become familiar with graph terminology and the different types of graphs  To study a Graph ADT and different.
Data Structures Using C++
CS 206 Introduction to Computer Science II 11 / 07 / 2008 Instructor: Michael Eckmann.
Graphs Graphs are the most general data structures we will study in this course. A graph is a more general version of connected nodes than the tree. Both.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 CHAPTER 4 - PART 2 GRAPHS 1.
CSCI 3160 Design and Analysis of Algorithms Tutorial 2 Chengyu Lin.
Graph.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
ITEC200 – Week 12 Graphs. 2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lectures 3 Tuesday, 9/25/01 Graph Algorithms: Part 1 Shortest.
Chapter 9 Graph algorithms. Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
CS 206 Introduction to Computer Science II 11 / 10 / 2008 Instructor: Michael Eckmann.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Spring 2010CS 2251 Graphs Chapter 10. Spring 2010CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
CS 206 Introduction to Computer Science II 11 / 05 / 2008 Instructor: Michael Eckmann.
(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.
Fall 2007CS 2251 Graphs Chapter 12. Fall 2007CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs To.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
CS 206 Introduction to Computer Science II 03 / 30 / 2009 Instructor: Michael Eckmann.
All-Pairs Shortest Paths
Dijkstra’s Algorithm Slide Courtesy: Uwash, UT 1.
CS 253: Algorithms Chapter 24 Shortest Paths Credit: Dr. George Bebis.
Parallel Programming – Graph Algorithms David Monismith CS599 Notes are primarily based upon Introduction to Parallel Programming, Second Edition by Grama,
L20: Putting it together: Tree Search (Ch. 6) November 29, 2011.
Graph Algorithms. Overview Graphs are very general data structures – data structures such as dense and sparse matrices, sets, multi-sets, etc. can be.
CSCI-455/552 Introduction to High Performance Computing Lecture 18.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
L17: Introduction to “Irregular” Algorithms and MPI, cont. November 8, 2011.
L20: Introduction to “Irregular” Algorithms November 9, 2010.
Parallel graph algorithms Antonio-Gabriel Sturzu, SCPD Adela Diana Almasi, SCPD Adela Diana Almasi, SCPD Iulia Alexandra Floroiu, ISI Iulia Alexandra Floroiu,
Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Adapted for 3030 To accompany the text ``Introduction to Parallel Computing'',
Graphs. Definitions A graph is two sets. A graph is two sets. –A set of nodes or vertices V –A set of edges E Edges connect nodes. Edges connect nodes.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley,
1 Prim’s algorithm. 2 Minimum Spanning Tree Given a weighted undirected graph G, find a tree T that spans all the vertices of G and minimizes the sum.
Graph Algorithms Gayathri R To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.
Static Process Scheduling
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
Graphs Upon completion you will be able to:
Graph Theory. undirected graph node: a, b, c, d, e, f edge: (a, b), (a, c), (b, c), (b, e), (c, d), (c, f), (d, e), (d, f), (e, f) subgraph.
Introduction to Algorithms All-Pairs Shortest Paths My T. UF.
Great Theoretical Ideas in Computer Science for Some.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
L20: Sparse Matrix Algorithms, SIMD review November 15, 2012.
COMPSCI 102 Introduction to Discrete Mathematics.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks (cont.) Dr. Xiao Qin Auburn University
Auburn University
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Short paths and spanning trees
Graphs Graph transversals.
Minimum Spanning Tree Algorithms
CS 584 Project Write up Poster session for final Due on day of final
CSE 417: Algorithms and Computational Complexity
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
More Graphs Lecture 19 CS2110 – Fall 2009.
Presentation transcript:

L21: “Irregular” Graph Algorithms November 11, 2010

Administrative Class cancelled, November 16! Guest Lecture, November 18, Matt Might CUDA Projects status -Available on CADE Linux machines (lab1 and lab3) and Windows machines (lab5 and lab6) -libcutil.a (in SDK) only installed on Linux machines -Windows instructions, but cannot do timings 11/11/10

Programming Assignment #3: Simple CUDA Due Monday, November 22, 11:59 PM Today we will cover Successive Over Relaxation. Here is the sequential code for the core computation, which we parallelize using CUDA: for(i=1;i<N-1;i++) { for(j=1;j<N-1;j++) { B[i][j] = (A[i-1][j]+A[i+1][j]+A[i][j-1]+A[i][j+1])/4; } You are provided with a CUDA template (sor.cu) that (1) provides the sequential implementation; (2) times the computation; and (3) verifies that its output matches the sequential code. 11/11/10

Programming Assignment #3, cont. Your mission: -Write parallel CUDA code, including data allocation and copying to/from CPU -Measure speedup and report -45 points for correct implementation -5 points for performance -Extra credit (10 points): use shared memory and compare performance 11/11/10

Programming Assignment #3, cont. You can install CUDA on your own computer - How to compile under Linux and MacOS -Visual Studio (no cutil library for validation) - -Version 3.1 in CADE labs lab1 and lab3: See Makefile at Turn in -Handin cs4961 proj3 (includes source file and explanation of results) 11/11/10

Outline Irregular parallel computation -Sparse matrix operations (last time) and graph algorithms (this time) -Dense graph algorithms (shortest path) -Sparse graph algorithms (shortest path and maximal independent set) Sources for this lecture: -Kathy Yelick/Jim Demmel (UC Berkeley): CS 267, Spr 07 s s -“Implementing Sparse Matrix-Vector Multiplication on Throughput Oriented Processors,” Bell and Garland (Nvidia), SC09, Nov Slides accompanying textbook “Introduction to Parallel Computing” by Grama, Gupta, Karypis and Kumar - 11/11/10

Representing Graphs An undirected graph and its adjacency matrix representation. An undirected graph and its adjacency list representation. 11/11/10

Common Challenges in Graph Algorithms Localizing portions of the computation -How to partition the workload so that nearby nodes in the graph are mapped to the same processor? -How to partition the workload so that edges that represent significant communication are co-located on the same processor? Balancing the load -How to give each processor a comparable amount of work? -How much knowledge of the graph do we need to do this since complete traversal may not be realistic? 11/11/10 All of these issues must be addressed by a graph partitioning algorithm that maps individual subgraphs to individual processors.

Start with Dense Graphs Partitioning problem is straightforward for dense graphs We’ll use an adjacency matrix and partition the matrix just like we would a data-parallel matrix computation Examples -Single-source shortest path and all-pairs shortest path 11/11/10

Single-Source shortest path (sequential) 11/11/10

Each processor is assigned a portion of V, l and w Which part? Involves a global reduction to pick minimum 11/11/10 Single-Source shortest path (parallel)

All Pairs Shortest Path Given a weighted graph G(V,E,w), the all-pairs shortest paths problem is to find the shortest paths between all pairs of vertices vi, vj ∈ V. We discuss two versions -Extend single-source shortest path -Matrix-multiply based solution (sparse or dense graph) 11/11/10

All-Pairs Parallel based on Dijkstra’s Algorithm Source partitioned: -Execute each of the n single-source shortest path problems on a different processor -That is, using n processors, each processor Pi finds the shortest paths from vertex vi to all other vertices by executing Dijkstra's sequential single-source shortest paths algorithm. -Pros and Cons? Data structures? Source parallel: -Use a parallel formulation of the shortest path problem to increase concurrency. -Each of the shortest path problems is further executed in parallel. We can therefore use up to n 2 processors. -Given p processors (p > n), each single source shortest path problem is executed by p/n processors. 1111/10

All-Pairs Shortest Paths: Matrix-Multiplication Based Algorithm The product of weighted adjacency matrix with itself returns a matrix that contains shortest paths of length 2 between any pair of nodes. It follows from this argument that A n contains all shortest paths. 11/11/10

Sparse Shortest Path (using adjacency list) 11/11/10 w(u,v) is a field of the element in v’s adjacency list representing u

Parallelizing Johnson’s Algorithm Selecting vertices from the priority queue is the bottleneck We can improve this by using multiple queues, one for each processor. Each processor builds its priority queue only using its own vertices. When process Pi extracts the vertex u ∈ Vi, it sends a message to processes that store vertices adjacent to u. Process Pj, upon receiving this message, sets the value of l[v] stored in its priority queue to min{l[v],l[u] + w(u,v)}. 11/11/10

Maximal Independent Set (used in Graph Coloring) A set of vertices I ⊂ V is called independent if no pair of vertices in I is connected via an edge in G. An independent set is called maximal if by including any other vertex not in I, the independence property is violated. 11/11/10

Sequential MIS algorithm Simple algorithms start by MIS I to be empty, and assigning all vertices to a candidate set C. Vertex v from C is moved into I and all vertices adjacent to v are removed from C. This process is repeated until C is empty. This process is inherently serial! 11/11/10

Foundation for Parallel MIS algorithm Parallel MIS algorithms use randomization to gain concurrency (Luby's algorithm for graph coloring). Initially, each node is in the candidate set C. Each node generates a (unique) random number and communicates it to its neighbors. If a node’s number exceeds that of all its neighbors, it joins set I. All of its neighbors are removed from C. This process continues until C is empty. On average, this algorithm converges after O(log|V|) such steps. 11/11/10

Parallel MIS Algorithm (Shared Memory) We use three arrays, each of length n - I, which stores nodes in MIS, C, which stores the candidate set, and R, the random numbers. Partition C across p processors. Each processor generates the corresponding values in the R array, and from this, computes which candidate vertices can enter MIS. The C array is updated by deleting all the neighbors of vertices that entered MIS. The performance of this algorithm is dependent on the structure of the graph. 11/11/10

Definition of Graph Partitioning Given a graph G = (N, E, W N, W E ) -N = nodes (or vertices), -W N = node weights -E = edges -W E = edge weights Ex: N = {tasks}, W N = {task costs}, edge (j,k) in E means task j sends W E (j,k) words to task k Choose a partition N = N 1 U N 2 U … U N P such that -The sum of the node weights in each N j is “about the same” -The sum of all edge weights of edges connecting all different pairs N j and N k is minimized Ex: balance the work load, while minimizing communication Special case of N = N 1 U N 2 : Graph Bisection 1 (2) 2 (2)3 (1) 4 (3) 5 (1) 6 (2) 7 (3) 8 (1) /11/10

Definition of Graph Partitioning Given a graph G = (N, E, W N, W E ) -N = nodes (or vertices), -W N = node weights -E = edges -W E = edge weights Ex: N = {tasks}, W N = {task costs}, edge (j,k) in E means task j sends W E (j,k) words to task k Choose a partition N = N 1 U N 2 U … U N P such that -The sum of the node weights in each N j is “about the same” -The sum of all edge weights of edges connecting all different pairs N j and N k is minimized (shown in black) Ex: balance the work load, while minimizing communication Special case of N = N 1 U N 2 : Graph Bisection 1 (2) 2 (2)3 (1) 4 (3) 5 (1) 6 (2) 7 (3) 8 (1) /11/10

Summary of Lecture Summary -Regular computations are easier to schedule, more amenable to data parallel programming models, easier to program, etc. -Performance of irregular computations is heavily dependent on representation of data -Choosing this representation may depend on knowledge of the problem, which may only be available at run time Next Time -Introduction to parallel graph algorithms -Minimizing bottlenecks 11/11/10