Parallel Graph Partioning Using Simulated Annealing Parallel and Distributed Computing I Sadik Gokhan Caglar.

Slides:



Advertisements
Similar presentations
Chapter 4 Partition I. Covering and Dominating.
Advertisements

Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
Every edge is in a red ellipse (the bags). The bags are connected in a tree. The bags an original vertex is part of are connected.
Simulated Annealing Premchand Akella. Agenda Motivation The algorithm Its applications Examples Conclusion.
METIS Three Phases Coarsening Partitioning Uncoarsening
Algorithms + L. Grewe.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
The Greedy Approach Chapter 8. The Greedy Approach It’s a design technique for solving optimization problems Based on finding optimal local solutions.
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
CS 584. Review n Systems of equations and finite element methods are related.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
61 Nondeterminism and Nodeterministic Automata. 62 The computational machine models that we learned in the class are deterministic in the sense that the.
Balanced Graph Partitioning Konstantin Andreev Harald Räcke.
Recent Development on Elimination Ordering Group 1.
Partitioning and Clustering Professor Lei He
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
1 Vertex Cover Problem Given a graph G=(V, E), find V' ⊆ V such that for each edge (u, v) ∈ E at least one of u and v belongs to V’ and |V’| is minimized.
Task Assignment and Transaction Clustering Heuristics.
EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.
MRF Labeling With Graph Cut CMPUT 615 Nilanjan Ray.
A scalable multilevel algorithm for community structure detection
1 Simulated Annealing Terrance O ’ Regan. 2 Outline Motivation The algorithm Its applications Examples Conclusion.
CS267 L15 Graph Partitioning II.1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 15: Graph Partitioning - II James Demmel
Applying Evolutionary Algorithm to Chaco Tool on the Partitioning of Power Transmission System (CS448 Class Project) Yan Sun.
Colored Necklace Bisection Victor Kostyuk Advisor: Michael Capalbo.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 11 Instructor: Paul Beame.
15-853Page :Algorithms in the Real World Separators – Introduction – Applications.
Multilevel Graph Partitioning and Fiduccia-Mattheyses
Introduction to Simulated Annealing 22c:145 Simulated Annealing  Motivated by the physical annealing process  Material is heated and slowly cooled.
Multilevel Hypergraph Partitioning G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar Computer Science Department, U of MN Applications in VLSI Domain.
CSE 589 Applied Algorithms Spring Colorability Branch and Bound.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Simulated Annealing.
Graph Partitioning Problem Kernighan and Lin Algorithm
CSE 494: Electronic Design Automation Lecture 4 Partitioning.
Minimum Spanning Tree Given a weighted graph G = (V, E), generate a spanning tree T = (V, E’) such that the sum of the weights of all the edges is minimum.
Final Exam Review Final exam will have the similar format and requirements as Mid-term exam: Closed book, no computer, no smartphone Calculator is Ok Final.
1 Simulated Annealing Contents 1. Basic Concepts 2. Algorithm 3. Practical considerations.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
Simulated Annealing.
CSC 172 P, NP, Etc. “Computer Science is a science of abstraction – creating the right model for thinking about a problem and devising the appropriate.
CSCI 2670 Introduction to Theory of Computing November 23, 2004.
CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Simulated Annealing in 512 bytes EMICRO2004 Microelectronics School Marcelo Johann B R A Z I L.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
CSE 6311 – Spring 2009 ADVANCED COMPUTATIONAL MODELS AND ALGORITHMS Lecture Notes – March 12, 2009 ILP – Integer Linear Programming Approximate algorithm.
Chapter 4 Partition (1) Shifting Ding-Zhu Du. Disk Covering Given a set of n points in the Euclidean plane, find the minimum number of unit disks to cover.
A Framework for Reliable Routing in Mobile Ad Hoc Networks Zhenqiang Ye Srikanth V. Krishnamurthy Satish K. Tripathi.
Ramakrishna Lecture#2 CAD for VLSI Ramakrishna
Prims Algorithm for finding a minimum spanning tree
PowerGraph: Distributed Graph- Parallel Computation on Natural Graphs Joseph E. Gonzalez, Yucheng Low, Haijie Gu, and Danny Bickson, Carnegie Mellon University;
1 Contents 1. Basic Concepts 2. Algorithm 3. Practical considerations Simulated Annealing (SA)
3/21/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 4. Circuit Partitioning (II)
CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning
High Performance Computing Seminar
CSC 172 P, NP, Etc.
Greedy Algorithms General principle of greedy algorithm
A Continuous Optimization Approach to the Minimum Bisection Problem
Lectures on Network Flows
Michael Chu, Kevin Fan, Scott Mahlke
Minimum Spanning Tree.
CS 584 Project Write up Poster session for final Due on day of final
Parallel Graph Algorithms
Dominating Set By Eric Wengert.
Presentation transcript:

Parallel Graph Partioning Using Simulated Annealing Parallel and Distributed Computing I Sadik Gokhan Caglar

Graph Partitioning Problem Given a Graph G = (N,E) and a integer p Find subsets N 1,N 2,…,N p such that  p i=1 N i = N and N i  N j = 0 for i  j 2.W(i)  W / p, i = 1,2,…,p, where W(i) and W are the sums of node weights in N i and N respectively 3.The cut size is minimized

A Partitioned Graph A partitioned graph with edge-cut of seven

Solutions To The Problem Geometric Algorithms: Use the geometric coordinates –Recursive coordinate (or orthogonal) bisection –Recursive circle bisection Structural Algorithms: –Graph-Walking Algorithms –Spectral Algorithms Refinement Algorithms: –Kernighan-Lin Algorithm –Simulated Annealing Algorithm

Solutions To The Problem Multilevel technique: Coarsen Partition Refinement

Simulated Annealing

Implementation of SA Cost: The number of edges that has vertices in different sets Acceptation: The new cost is less than the old Rejection: The new cost is more than the old, a probabilistic calculation can change a rejection into an acceptation (e  cost/Temp ) Equilibrium: Number of rejections < (10 * vertexsize of the graph * number of sets)

Implementation of SA Frozen state: The temperature starts from 1, the cooling constant is 0.95, it is considered frozen at temperature 0.2 currentcost = cost(graph); printf ("The cost of the graph1 is %f \n", currentcost); while (temp > 0.2) { while (reject < (10 * graph.vertexsize * graph.setsize)) { makenewgraph (graph, &newgraph); tempcost = cost(newgraph); if (tempcost < currentcost) { currentcost = tempcost; graphfree(&graph); graph = newgraph; }

Implementation of SA else { reject++; if (tempcost == currentcost) prob = e(1, temp); else prob = e((tempcost - currentcost), temp); prob2 = drand48(); if (prob > prob2) { currentcost = tempcost; graphfree(&graph); graph = newgraph; } else graphfree(&newgraph); }//1st else }//reject temp = temp * coolconst; reject = 0; printf("cooled!!! temp = %f \n", temp); printf ("currentcost %f\n", currentcost); } printf ("The cost of the graph2 is %f \n", currentcost);

Input File Format

Data Structures typedef struct Edge { int v1; int v2; } Edge; typedef struct Set { int size; int* vertex; } Set; typedef struct Graph { int vertexsize; int edgesize; int setsize; struct Edge* edgelist; struct Set* setlist; } Graph;

Parallelization Approach 1 Problem independent tried to implement a general parallel simulated annealing Every process will generate a new graph and calculate the new cost The results will be sent to the root process The root process will choose the best result and broadcast it.

Parallelization Approach 1 The array that root process gathers: 0 – Acceptation ( 0 no, 1 yes, 2 probability) 1 – Cost 2 – The set number of the first vertex 3 – The set number of the second vertex 4 – The first vertex 5 – The second vertex

Parallelization Approach 1 The array that root process broadcasts: 0 - Temperature update 1 – Change done 2 – The set number of the first vertex 3 – The set number of the second vertex 4 – The first vertex 5 – The second vertex 6 – The cost of the new graph

Parallelization Approach 1 The equilibrium function has changed. From Number of rejections < (10 * vertexsize of the graph * number of sets) to Number of rejections < (10 * vertexsize of the graph * number of sets / number of processes) The rest of the program is the same the data is not distributed

Parallelization Approach 2 Problem dependent, works for only graph partition problem. Most of the work in graph partitioning problem is to calculate the cost of the graph. This is dependent on the number of edges that the graph has, the edges array can be scattered to the processes The processes only needs the edges it has to calculate the partial sum. It is perfectly parallelizable.

Parallelization Approach 2 After each process calculates its partial sum and MPI_Reduce with add operation is done to calculate the total sum. All the simulated annealing operation is done on the root process the others only calculate their partial sums.

ParSA1 16 Nodes

ParSA1 100 Nodes

ParSA1 300 Nodes

ParSA1 500 Nodes

ParSA Nodes

ParSA2 16 Nodes

ParSA2 100 Nodes

ParSA2 300 Nodes

ParSA2 500 Nodes

ParSA Nodes

ParSA Nodes Edges

ParSA Nodes Edges