Measuring and Extracting Proximity in Networks By - Yehuda Koren, Stephen C.North and Chris Volinsky - Rahul Sehgal.

Slides:



Advertisements
Similar presentations
Single Source Shortest Paths
Advertisements

§3 Shortest Path Algorithms Given a digraph G = ( V, E ), and a cost function c( e ) for e  E( G ). The length of a path P from source to destination.
Lecture 5 Graph Theory. Graphs Graphs are the most useful model with computer science such as logical design, formal languages, communication network,
Traveling Salesperson Problem
3/17/2003Tucker, Applied Combinatorics Section 4.2a 1 Network Flows Michael Duquette & Whitney Sherman Tucker, Applied Combinatorics, Section 4.2a, Group.
Overview Discuss Test 1 Review RC Circuits
Data Structure and Algorithms (BCS 1223) GRAPH. Introduction of Graph A graph G consists of two things: 1.A set V of elements called nodes(or points or.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Measuring and Extracting Proximity in Complex Networks Emden Gansner, Yehuda Koren, Stephen North, Chris Volinsky AT&T Labs Research.
Chapter 4: Network Layer
Absorbing Random walks Coverage
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Lecture 21: Spectral Clustering
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
A New Force-Directed Graph Drawing Method Based on Edge- Edge Repulsion Chun-Cheng Lin and Hsu-Chen Yen Department of Electrical Engineering, National.
Dept. of Computer Science Distributed Computing Group Asymptotically Optimal Mobile Ad-Hoc Routing Fabian Kuhn Roger Wattenhofer Aaron Zollinger.
The Shortest Path Problem
Chapter 3 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
NetworkModel-1 Network Optimization Models. NetworkModel-2 Network Terminology A network consists of a set of nodes and arcs. The arcs may have some flow.
Social Media Mining Graph Essentials.
Advanced Algorithms Piyush Kumar (Lecture 5: Weighted Matching) Welcome to COT5405 Based on Kevin Wayne’s slides.
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
Copyright © Cengage Learning. All rights reserved.
Operations Research Assistant Professor Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University of Palestine.
Internet Traffic Engineering by Optimizing OSPF Weights Bernard Fortz (Universit é Libre de Bruxelles) Mikkel Thorup (AT&T Labs-Research) Presented by.
Network Optimization Models
Network Aware Resource Allocation in Distributed Clouds.
Finding dense components in weighted graphs Paul Horn
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
Modeling and Evaluation with Graph Mohammad Khalily Dermany Islamic Azad University, Khomein branch.
1 ELEC692 Fall 2004 Lecture 1b ELEC692 Lecture 1a Introduction to graph theory and algorithm.
UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Clustering Spatial Data Using Random Walks Author : David Harel Yehuda Koren Graduate : Chien-Ming Hsiao.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
Introduction to Graphs. Introduction Graphs are a generalization of trees –Nodes or verticies –Edges or arcs Two kinds of graphs –Directed –Undirected.
Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Chapter 5 Network Layer.
Digital Image Processing CCS331 Relationships of Pixel 1.
Lecture 5: Mathematics of Networks (Cont) CS 790g: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Online Algorithms By: Sean Keith. An online algorithm is an algorithm that receives its input over time, where knowledge of the entire input is not available.
Data Communications and Networking Chapter 11 Routing in Switched Networks References: Book Chapters 12.1, 12.3 Data and Computer Communications, 8th edition.
Week 11 - Monday.  What did we talk about last time?  Binomial theorem and Pascal's triangle  Conditional probability  Bayes’ theorem.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
10. Lecture WS 2006/07Bioinformatics III1 V10: Network Flows V10 follows closely chapter 12.1 in on „Flows and Cuts in Networks and Chapter 12.2 on “Solving.
Minimal Spanning Tree Problems in What is a minimal spanning tree An MST is a tree (set of edges) that connects all nodes in a graph, using.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Chapter 10 Graph Theory Eulerian Cycle and the property of graph theory 10.3 The important property of graph theory and its representation 10.4.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Graphs. Graphs Similar to the graphs you’ve known since the 5 th grade: line graphs, bar graphs, etc., but more general. Those mathematical graphs are.
© 2006 Pearson Addison-Wesley. All rights reserved 14 A-1 Chapter 14 Graphs.
Tunable QoS-Aware Network Survivability Presenter : Yen Fen Kao Advisor : Yeong Sung Lin 2013 Proceedings IEEE INFOCOM.
Chapter 9: Graphs.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Distance Vector Routing
Data Structures and Algorithm Analysis Graph Algorithms Lecturer: Jing Liu Homepage:
1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Approximation Algorithms based on linear programming.
1 Euler and Hamilton paths Jorge A. Cobb The University of Texas at Dallas.
Grade 11 AP Mathematics Graph Theory Definition: A graph, G, is a set of vertices v(G) = {v 1, v 2, v 3, …, v n } and edges e(G) = {v i v j where 1 ≤ i,
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
CSE 373: Data Structures and Algorithms Lecture 21: Graphs V 1.
BackTracking CS255.
Social Networks Analysis
Finding a Path With Largest Smallest Edge
Minimum Spanning Tree 8/7/2018 4:26 AM
Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Graphs Chapter 11 Objectives Upon completion you will be able to:
Introduction Wireless Ad-Hoc Network
Presentation transcript:

Measuring and Extracting Proximity in Networks By - Yehuda Koren, Stephen C.North and Chris Volinsky - Rahul Sehgal

Introduction Network Information hidden in a network What is Proximity? Why do we need proximity in a network? Methods for measuring and extracting proximities in a network. CFEC ( Cycle free effective conductance) and how to compute cycle-free escape probability Extracting proximity through proximity graphs obtained by CFEC Working with large networks Experiments Conclusion Questions

Network Collection of edges or links Collection of nodes These edges and links help in deciding the proximity in a given network. node 5 Node 6 node 4

Hidden information in a network C D E A F B

If two people speak on the phone to many common friends, the probability is high that they will talk to each other in the future, or perhaps that they already communicate through some other medium as . C D E A F B

Hidden information in a network If two nodes are connected to a common node then it might be possible that they have a strong relationship or they don’t have any relationship at all. for example: in a telephone network where many people call to service center but it might be possible that two people who are calling they don’t know each other at all. C D E A F B

What is proximity? Proximity is a method of measuring distance or closeness between different objects. It is a method of finding hidden relationship between the objects. It is a method of finding similarities and helps in clustering objects or nodes.

Need of proximity It measures potential information exchange between two non-linked objects through intermediaries It can measure the extent to which two nodes belong to each other. It helps in knowing the likelihood that a link will exist in future. In social network settings proximity helps to predict or track the propagation of an idea, product or disease. It helps in discovering unexpected communities in a network.

Various Methods Graph-theoretic distance:  It is length of shortest path connecting two nodes measured either by number of hops or sum of weight of edges Limitations  Proximity decays as nodes become farther apart.  Information may be lost due to friction or noise at a particular node.  This method doesn’t assume that proximity can exist via multiple paths.

Various Methods contd… Network Flow (maximal network flow)  Limited capacity is assigned to each edge, depending upon the weight of that edge and then compute the maximal number of units that can be simultaneously delivered from node s to node t.  It prefers high weight edges and captures the premise that an increasing number of alternative paths increases the proximity. For example we consider the adjacent figure. AB a1 b1 A B a1

Various Methods contd… Network Flow (maximal network flow) Limitations:  It disregards the length of the path.  It also follow that the maximal s-t flow (that is graph flow from node s to node t) in a graph equals the minimal s-t cut – that is the minimal edge capacity we need to remove to disconnect s from t, therefore we cannot implement it in a robust system.

Various Methods contd… Effective Conductance (EC):  Modeling of networks as an electric circuit by treating the edges as resistors whose conductance is the given edge weight.  In this method we keep the starting node (s) as 1 and the end node (t) as 0. and then we solve the linear equations for getting voltage and current on each edge.  It accounts for both path length and number of alternative paths.  It avoids dependence on single shortest path and bottlenecks  It has a monotonicity property which states that in an electrical resistor network, increasing the conductance of any resistor or increasing the number of resistors increase conductance between any two nodes.

Various Methods contd… Limitations of Effective Conductance-  Monotonicity has its limitations consider following example s1 t1 a1 s t same EC

Various Methods contd… Sink augmented effective conductance  Each node in a network is connected to a universal sink which is at voltage zero.  This universal sink competes with the node t.  Universal sink “tax” each node that absorbs a portion of out going current. Consequently it forces all the node to have degree greater than 1. So, our restriction for degree one node doesn’t exist any more.

Various Methods contd… Limitations of sink augmented effective conductance  Its required to know how much current will flow through each node. Understanding how such parameter will influence the proximity is a difficult task.  It destroys the concept of monotonicity. It means whenever a new node is added to the network it has a direct link to the sink but not to t. It strengthens sink and it compete with node t and thus the proximity between s and t is lost. So, we look for a solution that can overcome the limitations of above methods…..

Random Walk Definitions:  Random Walk is a transition from one state to another without depending on the previous state. The transition could be to the same state also. s t a1 a3 a2 a4

Random Walk  Random walk in network proximity is the infinite number of attempts that is made to reach from starting node s to end node t. It might be possible that when we traverse this path we might return back to the starting node s. s t

Random Walk Explanation  In network it might be possible that during random walk we might go back to s without going to t.  As we can see in the diagram it might be possible that we return back to s via path s–a1-a2-a3-s without going to t.  We are having a cyclic path which is leading us back to starting point s. continued =>

Random Walk Diagram s t a1 a4 a2 a5 a3

What is our goal? We have to improve the effective conductance measure and avoid any cyclic path…..

CFEC( Cycle Free Effective Conductance)  It considers random walk interpretation  DEFINITION: The cycle-free escape probability from s to t is the probability that a random walk originating at s will reach t without visiting any node more than once.  In random walk, a probability of transition from node i to node j is = probability of transition from node i to node j = weight of edge from i to j = degree of node i.

CFEC( Cycle Free Effective Conductance)  The probability that a random walk starting at v1 for path P= v1-v2-…-vr, will follow this path is given by:

Features of CFEC: We have following equalities:  Multiplying by the degree: R = set of simple paths from s to t, simple path are those that never visit the same node twice  CFEC discourages long paths as the probability of following the path decays exponentially with its length. It is given by  It supports proximity measure for multiple paths.  Degree-1 nodes dilute the significance of path from s to t. So we cut the main graph into sub-graph. And we preserve the original degree of the nodes.

Explanation of CFEC features s t c1c2 CFEC discourages long paths: ckck Proximity decreases

Computing Cycle Free Escape Probability Restrict the sum in following equation to K most probable simple path. Edge weights are transformed into edge lengths, establishing 1-1 correspondence between path probability and path length.  Path probability is given as

Extracting Proximity Through Proximity Graphs : Extract a subgraph that maximizes the ratio: = subgraph = constant,  Find subset of R, which required for above problem.  Set of simple paths R is sorted in ascending order of weights of paths.

Extracting Proximity Through Proximity Graphs : Optimizing the given formula using branch and bound path merging algorithm:

Branch and Bound path merging algorithm

Working with large networks Growing candidate graphs via {s, t} neighborhoods  Producing the candidate graph is to find a sub graph containing the highest weight paths originating at either s or t.  Now, the problem becomes, find a sub graph containing shortest paths originating at either s or t. Our objective is to expand the neighborhoods of s and t. This is done by Dijkstra’s algorithm for computing shortest path on graphs with non-negative edge lengths.

Working with large networks contd…. Determining neighborhood size  we determine paths which are not longer than L – log(€). Path lengths greater than is are not useful. Here L is the length of shortest s-t path.  In practice, (L – log(€))/2 – neighborhoods might be too large.

Working with large networks contd…. Pruning the neighborhood:  We prune the neighbor in such a way that dist(s,i)+dist(t,i) > β, where s is our starting node and t is terminating node.  Then, we exclude from the neighborhood any i for which dist(s,i)+dist(i,t) > L – log(€).

Experiments: Online movie database IMDB. Co-authorship graph. Telecommunication graph.

Experiments:

Telecommunication Graph Extract the candidate graph by growing neighborhoods from the two nodes of interest random pairs of telephone numbers to calculate the CFEC value. For 1808(90%) of these pairs paths between them were found. Others do not have any known path between them, this could be due to, these number were not in frequent use. We have values for as 1,5,10 and 50.

Telecommunication Graph (contd…)

Conclusion: CFEC proximity allows us to readily compute proximity graphs, which are small portions of the network that are aimed at capturing a related proximity value. An analyst studying proximity in a graph has to focus on the most relevant part of the graph. It is extension of connection graph which is capable of presenting compact relationship between objects of a network. We can deduce relationship between more than two endpoints, the flexibility to handle edge direction, and the fact that they are obtained by solving an intuitively tunable optimization problem.

QUESTIONS??????

References: