Presentation is loading. Please wait.

Presentation is loading. Please wait.

IDENTIFICATION OF DENSE SUBGRAPHS FROM MASSIVE SPARSE GRAPHS

Similar presentations


Presentation on theme: "IDENTIFICATION OF DENSE SUBGRAPHS FROM MASSIVE SPARSE GRAPHS"— Presentation transcript:

1 IDENTIFICATION OF DENSE SUBGRAPHS FROM MASSIVE SPARSE GRAPHS
Under the guidance of Presented By Mr. Prashantha. N.R Sumeeth.K.C Based on the work of Gibson et al. In Proc. 2005, 31st International Conference of Very Large Databases.

2 Overview Introduction. Graph Terminology.
Applications of Dense Subgraph Identification. Algorithm. Observation and Results.

3 Introduction Dense subgraphs are groups of nodes which are interlinked such that the number of edges between the nodes is close to the maximal number of nodes. Dense Subgraph extraction naturally becomes a preferred tool to analyse the network thoroughly. Though currently we have many tools to identify and extract dense subgraphs, these tools are incapable of performing satisfactorily when the number of nodes increases over a few tens of nodes. A new algorithm has been developed by Gibson et al which handles huge graphs of millions of nodes with reasonable amount of resources. This algorithm uses a technique of obtaining the fingerprints if the nodes along with their set of outlinks and compares the fingerprints to obtain a dense subgraph.

4 Graph Terminology A directed grapg G=(V,E) consists of a set V of nodes or vertices and a set E of edges, where each edge is an ordered pair of nodes. A dense graph is a graph G=(V,E) in which |E|=O(|V*V|). A sparse graph is a graph G=(V,E) in which |E|=O(|V|). A connected component in an undirected graph is a subset of nodes such that for every pair of nodes in the subset, there is an undirected path between the pair. A host graph is directed graph in which the nodes are the hosts and edges are present from source to destination whenever a page of the source host links to a page on the destination host.

5 Applications Social Network Analysis
In the analysis of the density of the social networks. Telecom Call Record analysis In analysing the networks for business and management purposes. Link Spam Detection The algorithm can incorporated in search engines to combat the problem of Link Spam. Covert Network Analysis The most famous example of this is the analysis of the 9/11 attack.

6 A Real World Example

7 Description In the network map above, the hijackers are color coded by the flight they were on. The dark grey nodes are others who were reported to have had direct, or indirect, interactions with the hijackers. The gray lines indicate the reported interactions -- a thicker line indicates a stronger tie between two nodes. Notice the clustering around the pilots.

8 Shingling Algorithm This algorithm takes as input the set of outlinks for each node A1,...An and the parameters which affect the density s and c. Let H be a hash function from strings to integers. Select a very large prime number p, say 32 bits. Select random numbers a1,b1,...ac, bc in [1....p]. For i=1 to n do Xi=H(“Ai”). For j=1 to c do For i=1 to n do Yi=(ai*Xi+bi)mod p. Let Y1,...Ys be s minimal elements of Y. Let Zj=H(“Y1....Ys”). Output Z1,....Zc.

9 Recursive Shingling This algorithm takes as input the graph along with the density affecting parameters s1,c1 and s2,c2. This algorithm applies an (s1,c1) shingling algorithm first to the outlinks of each nodes and then applies an (s2,c2) shingling algorithm to the first-level shingles. Initially the memory layout of the graph will be O(n*n). After applying the above algorithm the memory layout decreases to O(c*C). The output of the algorithm are the second-level shingles.

10 Connected Components Algorithm
Algorithm CC() It takes as input all the second-level shingles along with the corresponding first-level shingles which share the second-level shingle. It uses the classical union-find data structure which consists of singleton sets initially. For every edge (u,v), the sets containing the nodes are merged. It gives as output the clusters of connected components.

11 Features of the Algorithm
The algorithm is able to handle large number of nodes and is able to identify large dense subgraphs. The shingling algorithm can be applied to the first-level shingles to obtain a hierarchical subgraphs. Use little memory and computational resources. Its scalable compared to other algorithms on this problem.

12 Observations and Results
The algorithm was applied on a large collection of densely-interlinked websites and the host graph was studied. The data set used by the authors was provided by IBM's WebFountain Project which contains over 2 billion pages and 50 million sites. After the second-level shingling and the application of the connected components algorithm, it produced 2.8 million connected components Figures from the work of Gibson et al.


Download ppt "IDENTIFICATION OF DENSE SUBGRAPHS FROM MASSIVE SPARSE GRAPHS"

Similar presentations


Ads by Google