Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial

Similar presentations


Presentation on theme: "Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial"— Presentation transcript:

1 Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
E. Kolaczyk Notes Data Mining - Volinsky Columbia University

2 Data Mining - Volinsky - 2011 - Columbia University
Social Networks Network: A collection of inter-connected things Also called “graph mining” Data consisting of nodes and edges Note: different than “graphical models” (graphical representation of dependence of random variables) Edges represent: Relationship between nodes Behavior observed between nodes High similarity between nodes Edges typically weighted Nodes and edges both can have attributes associated Can be directed or undirected Directed: phone calls, s Undirected: collaboration, physical networks, friendship Data Mining - Volinsky Columbia University

3 Data Mining - Volinsky - 2011 - Columbia University
Examples Data Mining - Volinsky Columbia University

4 Networks are everywhere!
Data Mining - Volinsky Columbia University

5 Data Mining - Volinsky - 2011 - Columbia University
Layout Layout matters! Especially with directed graphs Data Mining - Volinsky Columbia University

6 Facebook “Friend Wheel”
Data Mining - Volinsky Columbia University

7 LinkedIn LinkedIN community from LinkedIn labs
Data Mining - Volinsky Columbia University

8 Networks: A Matter of Scale
Data Mining - Volinsky Columbia University

9 Measurements on networks: Nodes and Edges
Node degree (node) Number of edges coming in and out of a node is its degree If directed, in-degree and out-degree are different Degree centrality (node): How ‘central’ is a given data point How many times does it appear in a ‘shortest path’ Centrality = importance Centrality (edge): How central is an edge? Similar ‘shortest path’ definition Does removing it create more clusters? Data Mining - Volinsky Columbia University

10 Measurements on networks (graph)
Degree Distribution The distribution of all edge degrees characterizes the graph Normal or highly skewed? Clustering Coefficient (graph): How “dense” is the graph? Given n nodes, how many possible edges? Density = #Edges/Possible edges How likely is it that your friends are friends Count: how many triangles Diameter (graph) Largest shortest path Shortest paths (graph) Histogram of shortest paths Connectivity (graph) Fully connected? Connected components For directed: strongly connected components Data Mining - Volinsky Columbia University

11 Data Mining - Volinsky - 2011 - Columbia University
Models on networks Random (Erdos-Renyi) All edges occur randomly w probability p Degree distribution follows Poisson distribution Exponential (p*) models Statistical model: Extension of Erdos-Renyi Defines a probability distribution over graph properties Preferential attachment Generative Model: New nodes create m links (based on Poisson) attach to existing nodes proportional to degree of that node Rich get richer Data Mining - Volinsky Columbia University

12 Data Mining - Volinsky - 2011 - Columbia University
Real-world networks Degree distributions in real-world networks are heavily skewed to the right preferential attachment fits this model Long tail of values above the mean Large mean, small median, small diameter Leads to a “power law” Let k = degree and pk = the number of nodes that have that degree A plot of log k vs. log pk should be linear. Many real world data sets follow a power law: Online sales Word length distributions Number of friends on Facebook! Data Mining - Volinsky Columbia University

13 Data Mining - Volinsky - 2011 - Columbia University
More Power Law Data Mining - Volinsky Columbia University

14 Erdos-Renyi vs. Power-law
From Leskovec & Faloutsos Data Mining - Volinsky Columbia University

15 Data Mining - Volinsky - 2011 - Columbia University
Small World Real-world data sets tend to have power-law distributions Also, tend to have a “small world” property Everyone is reachable via a small number of edges Small diameters Stanley Milgram experiment 1967 People given letter, asked to forward to one friend source: random residents of Omaha target: stockbroker in Boston Of completed chains, averaged 6 hops hence, Data Mining - Volinsky Columbia University

16 Data Mining - Volinsky - 2011 - Columbia University
Small World Networks Watts and Strogatz [1998] introduced small-world. Navigable Social Networks [Kleinberg 2000] Showed how small world networks are created put n people on a k-dimensional grid connect each to its immediate neighbors add one long-range link per person Everyone will be connected via a short path This is the way the real world works!!! Data Mining - Volinsky Columbia University

17 Data Mining - Volinsky - 2011 - Columbia University
Small World Networks Another look Data Mining - Volinsky Columbia University

18 Data Mining - Volinsky - 2011 - Columbia University
Sampling Networks How do you sample from a massive network? Simplest method – Induced Subgraph Randomly sampled nodes and edges between them Not so great! Yellow nodes randomly sampled but don’t have the same graph properties! Data Mining - Volinsky Columbia University

19 Sampling Networks Snowball Sampling:
Pick a random sample and then follow their ‘tree’ for a set number of ‘hops’ Still not perfect but better Other ideas abound but little agreement Great area for research! Data Mining - Volinsky Columbia University

20 Network Problems of Interest
Link Prediction: can we use existing network data to infer links where they don’t exist? Links in the future? Missing data Simple methods Look for many common neighbors Complex methods Stochastic Blockmodels Similar to using SVD to ‘fill in’ a matrix Agarwal and Pregibon ‘04 Data Mining - Volinsky Columbia University

21 Network Problems of Interest
Graph Matching / Similarity Fraud (‘repetitive debtors’) Citation de-noising Need a metric to define difference between graphs Collective Inference What can you learn about someone from their network? Fraud (‘guilt by association’) Viral marketing Following example courtesy of Sofus MacSkassy Data Mining - Volinsky Columbia University

22 A Relational Neighbor Classifier (wvRN)
? Sofus A. Macskassy

23 A Relational Neighbor Classifier (wvRN)
? ? ? ? Sofus A. Macskassy

24 A Relational Neighbor Classifier (wvRN)
? ? ? ? Sofus A. Macskassy

25 Collective wvRN Classify all entities in the network simultaneously, because (if done well) inferences about neighbors can reduce statistical bias (cf. Jensen et al. KDD-04) ? ? ? ? ? ? ? ? ? ? Sofus A. Macskassy

26 Collective wvRN Classify all entities in the network simultaneously, because (if done well) inferences about neighbors can reduce statistical bias (cf. Jensen et al. KDD-04) ? ? ? ? ? ? ? ? ? ? Sofus A. Macskassy

27 Collective wvRN Classify all entities in the network simultaneously, because (if done well) inferences about neighbors can reduce statistical bias (cf. Jensen et al. KDD-04) ? ? ? ? ? ? ? ? ? ? Sofus A. Macskassy

28 Collective wvRN Classify all entities in the network simultaneously, because (if done well) inferences about neighbors can reduce statistical bias (cf. Jensen et al. KDD-04) ? ? ? ? ? ? ? ? ? ? Sofus A. Macskassy

29 Collective wvRN Classify all entities in the network simultaneously, because (if done well) inferences about neighbors can reduce statistical bias (cf. Jensen et al. KDD-04) ? ? ? ? ? ? ? ? ? ? Sofus A. Macskassy

30 Collective wvRN Classify all entities in the network simultaneously, because (if done well) inferences about neighbors can reduce statistical bias (cf. Jensen et al. KDD-04) ? ? ? ? Sofus A. Macskassy

31 Network Problems of Interest
Diffusion Information or virus diffusion Community Detection Subgroups have a higher density within the subgroup Can remove edges with high centrality to try and find communities Understanding of Social Networks Facebook Data Mining - Volinsky Columbia University

32 Data Mining - Volinsky - 2011 - Columbia University
References Leskovec / Faloutsos Tutorial (mostly part 1) Eric Kolacyzk Notes and book Watts and Strogatz: “Collective dynamics of `small-world' networks”: Nature 393 p Networks. MEJ Newman book. Linked: How Everything Is Connected to Everything Else and What It Means : Albert Barabasi Enron Data Tools Graphviz.org for visualization Igraph (R package) Data Mining - Volinsky Columbia University


Download ppt "Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial"

Similar presentations


Ads by Google