Finding dense components in weighted graphs Paul Horn 12-2-02.

Slides:



Advertisements
Similar presentations
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Advertisements

C&O 355 Lecture 23 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Comp 122, Spring 2004 Greedy Algorithms. greedy - 2 Lin / Devi Comp 122, Fall 2003 Overview  Like dynamic programming, used to solve optimization problems.
Greedy Algorithms Greed is good. (Some of the time)
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Mauro Sozio and Aristides Gionis Presented By:
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Lecture 22: April 18 Probabilistic Method. Why Randomness? Probabilistic method: Proving the existence of an object satisfying certain properties without.
1 Partition Into Triangles on Bounded Degree Graphs Johan M. M. van Rooij Marcel E. van Kooten Niekerk Hans L. Bodlaender.
Complexity ©D Moshkovitz 1 Approximation Algorithms Is Close Enough Good Enough?
More Graph Algorithms Minimum Spanning Trees, Shortest Path Algorithms.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Fast FAST By Noga Alon, Daniel Lokshtanov And Saket Saurabh Presentation by Gil Einziger.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
1 Spanning Trees Lecture 20 CS2110 – Spring
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
CSC 2300 Data Structures & Algorithms April 17, 2007 Chapter 9. Graph Algorithms.
HCS Clustering Algorithm
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
Fast algorithm for detecting community structure in networks.
Approximation Algorithms
Quicksort.
A scalable multilevel algorithm for community structure detection
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Carmine Cerrone, Raffaele Cerulli, Bruce Golden GO IX Sirmione, Italy July
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
1 Shortest Path Calculations in Graphs Prof. S. M. Lee Department of Computer Science.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
HOW TO SOLVE IT? Algorithms. An Algorithm An algorithm is any well-defined (computational) procedure that takes some value, or set of values, as input.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
The Best Algorithms are Randomized Algorithms N. Harvey C&O Dept TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AAAA.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
SPANNING TREES Lecture 21 CS2110 – Spring
Network Aware Resource Allocation in Distributed Clouds.
Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
June 21, 2007 Minimum Interference Channel Assignment in Multi-Radio Wireless Mesh Networks Anand Prabhu Subramanian, Himanshu Gupta.
Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.
Spring 2015 Mathematics in Management Science Network Problems Networks & Trees Minimum Networks Spanning Trees Minimum Spanning Trees.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.
Graph reordering/partitioning with redundancy. Motivation 1. distributed graph processing – Use redundancy to reduce the costly communication – Reordering.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Online Algorithms By: Sean Keith. An online algorithm is an algorithm that receives its input over time, where knowledge of the entire input is not available.
Union-find Algorithm Presented by Michael Cassarino.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
CS270 Project Overview Maximum Planar Subgraph Danyel Fisher Jason Hong Greg Lawrence Jimmy Lin.
SPANNING TREES Lecture 20 CS2110 – Fall Spanning Trees  Definitions  Minimum spanning trees  3 greedy algorithms (incl. Kruskal’s & Prim’s)
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
1 CS270 Project Overview Maximum Planar Subgraph Danyel Fisher Jason Hong Greg Lawrence Jimmy Lin.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Grade 11 AP Mathematics Graph Theory Definition: A graph, G, is a set of vertices v(G) = {v 1, v 2, v 3, …, v n } and edges e(G) = {v i v j where 1 ≤ i,
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
Finding Communities by Clustering a Graph into Overlapping Subgraphs
Enumerating Distances Using Spanners of Bounded Degree
Instructor: Shengyu Zhang
CSE 373 Data Structures and Algorithms
Finding Subgraphs with Maximum Total Density and Limited Overlap
Lecture 27 CSE 331 Nov 2, 2010.
Backtracking and Branch-and-Bound
CSE 373: Data Structures and Algorithms
Instructor: Aaron Roth
Presentation transcript:

Finding dense components in weighted graphs Paul Horn

Overview Addressing the problem What is the problem What is the problem How it differs from other already solved problems How it differs from other already solved problems Building a solution Already existing research Already existing research Preliminary work Preliminary work Final solution Final solution

Overview: The Sequel Analysis Testing Testing Effectiveness Effectiveness Time Complexity Time Complexity Future Work Trimming the data set more Trimming the data set more Linking it with real data Linking it with real data

The problem To find dense subgraphs of a graph. Not just the densest Not just the densest Not necessarily all, but as many as possible of graphs that are ‘dense enough’ Not necessarily all, but as many as possible of graphs that are ‘dense enough’ The idea is to identify communities based on a communications network The more dense the communication is in within a subgraph, the more likely it is a community The more dense the communication is in within a subgraph, the more likely it is a community

Why is it hard The fastest flow based methods for finding the single densest are cubic or worse. We want more than one dense subgraph The greedy approximation algorithm is destructive and thus returns only one graph The problem becomes harder when we allow subgraphs to overlap

Weighty Ideas Input graphs to the algorithm are weighted Weights of a graph represent the intensity of a communication Intensity represents the duration and frequency of a communication Intensity represents the duration and frequency of a communication Requires a new definition of density Requires a new definition of density

How dense can it get? Recall our old definition of density density We modify it to give a notion of density of a notion of density of a weighted graph weighted graph Note that if the weight of all edges is one the two definitions

Done before? Discussed in Charikar paper presentation Goldberg, A.V., Finding a Maximum Density Subgraph. A flow based maximum density subgraph algorithm Charikar, Greedy Approximation Algorithms for finding Dense Components in a Graph presented a linear approximation algorithm

Preliminary Work An implementation of Goldberg and Charikar’s algorithm In test data (generated in a dual-probability Erdos-Reyne model) Charikar’s algorithm identified close to the actual density graph In test data (generated in a dual-probability Erdos-Reyne model) Charikar’s algorithm identified close to the actual density graph These graphs, however were unweighted and thus ignored the weighted requirement, and it only had one dense subgraph. These graphs, however were unweighted and thus ignored the weighted requirement, and it only had one dense subgraph.

A First Attempt A modification of Charikar’s algorithm for weighted graphs At each step remove a random edge of lowest weight. Then find all connected components Recurse down on each component, and return the maximal density subgraph. By repeated executions of the algorithm the hope is that different dense components will be revealed, that can overlap.

Seems Promising, but… In test cases generated similarly to that used in testing Charikar and Goldberg’s algorithm, successfully identified close to, if not the entire, dense portions. In simulated communication network data, the graph was dense enough that large areas of the graph were denser than the smaller portions, and they were not found.

Partitioning? By partitioning optimally, by finding a cut of minimum size we can increase the density of the graph (to some extent) Since we cut edges of low weight, the edges of high weight remain on each of the partitions. Since we cut edges of low weight, the edges of high weight remain on each of the partitions. (Obviously) doesn’t work forever (Obviously) doesn’t work forever However knowing approximately what size we want we can find ideal candidates However knowing approximately what size we want we can find ideal candidates

Rethinking our algorithm Partitioning based algorithm idea Uses Kernighan-Lee to find close to optimal partitions. Uses Kernighan-Lee to find close to optimal partitions. Recurses down on the partitions until the are of the desired size. Recurses down on the partitions until the are of the desired size. The densest of the partitions left are our output. The densest of the partitions left are our output.

Finalizing our thought Run the algorithm on more than one partition. Random partitions are likely to be close to orthogonal. Generate k partitions, and take best l partitions (after KL is applied) at the top level On each other level, generate k partitions, and take the top one.

Analyzing the Situation The 2-approximation bound that we had for KL- is no longer necessarily valid. The algorithm has met with some success in identify clusters in simulated data, but needs more tuning with respect to size, and the trimming of the data set. By trimming out small partitions that are found that are similar, we reduce overlap Now may find too many graphs, or incorrect graphs but this problem can be relieved by taking only the small portions of a certain density (say, some percentage of the final)

Time it. Original modification to Charikar runs in approximately O(|V||E|) time New algorithim runs in approximately O(kl|V| 2 log|V|) time. k, l due to generated the k partitions each time, and picking the top l at each step. k, l due to generated the k partitions each time, and picking the top l at each step. |V| 2 is a result of Kernighan-Lee |V| 2 is a result of Kernighan-Lee log|V| is the result of continuing to partition log|V| is the result of continuing to partition In practice runs very fast. Partitioning graphs of size vertices is possible in a reasonable amount of time. In practice runs very fast. Partitioning graphs of size vertices is possible in a reasonable amount of time.

In the future The algorithm still needs to better trim the partitions it finds, and specifically needs to find partitions of more variable size Could perhaps trim based on the density of the entire graph, or perhaps based on a maximum density subgraph (as found by the modified Charikar) Could perhaps trim based on the density of the entire graph, or perhaps based on a maximum density subgraph (as found by the modified Charikar) Already finds graphs of many sizes, but only considers the smallest at the end, so could be modified to include more of the larger partitions Already finds graphs of many sizes, but only considers the smallest at the end, so could be modified to include more of the larger partitions

In the Future II Future data will not be simulated, but instead come from online sources Running on a newsgroup induced graph, for instance, can hopefully help identify groups interested in particular topics. Running on a newsgroup induced graph, for instance, can hopefully help identify groups interested in particular topics. Finding graphs based on or portions of the web graph, could help identify groups of friends or topic-related sites as well, and thus help predict communities Finding graphs based on or portions of the web graph, could help identify groups of friends or topic-related sites as well, and thus help predict communities

So What? By looking at not just a graph, but a series of time based graph we can identify communities and how they change over time. Using this method we can hope to identify rules which govern the changes of these communities and make predictions on their future actions Simulated data used was designed with this end in mind.

Summing Up Finding multiple dense subgraphs of a graph is a relatively unexplored topic, especially finding dense subgraphs of large graphs (so that exact algorithms are unreasonable) Prior work (such as Goldberg and Charikar) centered on finding a single densest subgraph

Summing down First algorithm a modification of Charikar centered around removing edges and finding connected components Second algorithm based on Kernighan- Lee algorithm for finding optimal partitions, and recursing down to find small subgraphs that are generated by cutting a small number of vertices.

The Summing Still work to do: Linking it back to the real data Linking it back to the real data Internet data from newsgroups, , etc Using that to find communities over time Finding microlaws that govern them based on how the communities change over time Finding better ways to trim data to ensure that the best candidates are found Finding better ways to trim data to ensure that the best candidates are found