Universal Network Structure and Generative Models Networked Life CIS 112 Spring 2010 Prof. Michael Kearns.

Slides:



Advertisements
Similar presentations
Analysis and Modeling of Social Networks Foudalis Ilias.
Advertisements

Week 4 – Random Graphs Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Analysis of Social Media MLD , LTI William Cohen
Models of Network Formation Networked Life NETS 112 Fall 2013 Prof. Michael Kearns.
Information Networks Small World Networks Lecture 5.
ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Long Tails and Navigation Networked Life CSE 112 Spring 2007 Prof. Michael Kearns.
Strategic Models of Network Formation Networked Life CIS 112 Spring 2010 Prof. Michael Kearns.
Network Statistics Gesine Reinert. Yeast protein interactions.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
Network Science: “Universal” Structure and Models of Formation Networked Life CIS 112 Spring 2008 Prof. Michael Kearns.
Experiments in Behavioral Network Science: Brief Coloring and Consensus Postmortem (Revised and Updated 4/2/07) Networked Life CSE 112 Spring 2007 Michael.
Evaluating Hypotheses
News and Notes, 2/17 New Kleinberg article added to SNT readings –Watts chapters plus four articles Homework 2 distributed today, due Feb 26 –heading towards.
Advanced Topics in Data Mining Special focus: Social Networks.
Economic Models of Network Formation Networked Life CIS 112 Spring 2008 Prof. Michael Kearns.
News and Notes: Feb 9 Watts talk reminder: –tomorrow at noon, Annenberg School (3620 Walnut), Room 110 –extra credit reports Turn in revisions of NW Construction.
Social Network Theory Networked Life CSE 112 Spring 2006 Prof. Michael Kearns.
NetworkNetwork Science: “Universal” Structure and Models of FormationScience Networked Life CIS 112 Spring 2009 Prof. Michael Kearns.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Slide 1 Statistics Workshop Tutorial 4 Probability Probability Distributions.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Review of Probability and Statistics
Random Graph Models of Social Networks Paper Authors: M.E. Newman, D.J. Watts, S.H. Strogatz Presentation presented by Jessie Riposo.
The Erdös-Rényi models
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
1 9/23/2015 MATH 224 – Discrete Mathematics Basic finite probability is given by the formula, where |E| is the number of events and |S| is the total number.
Contagion in Networks Networked Life NETS 112 Fall 2013 Prof. Michael Kearns.
Probability Rules!. ● Probability relates short-term results to long-term results ● An example  A short term result – what is the chance of getting a.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Some Analysis of Coloring Experiments and Intro to Competitive Contagion Assignment Prof. Michael Kearns Networked Life NETS 112 Fall 2014.
Random-Graph Theory The Erdos-Renyi model. G={P,E}, PNP 1,P 2,...,P N E In mathematical terms a network is represented by a graph. A graph is a pair of.
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Estimators and estimates: An estimator is a mathematical formula. An estimate is a number obtained by applying this formula to a set of sample data. 1.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Sampling and estimation Petter Mostad
How Do “Real” Networks Look?
A random variable is a variable whose values are numerical outcomes of a random experiment. That is, we consider all the outcomes in a sample space S and.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2013 Figures are taken.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Contagion in Networks Networked Life NETS 112 Fall 2015 Prof. Michael Kearns.
Topics In Social Computing (67810)
Structural Properties of Networks: Introduction
How Do “Real” Networks Look?
Networked Life NETS 112 Fall 2018 Prof. Michael Kearns
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Social Network Analysis
Models of Network Formation
Models of Network Formation
Networked Life NETS 112 Fall 2017 Prof. Michael Kearns
Models of Network Formation
How Do “Real” Networks Look?
Networked Life NETS 112 Fall 2014 Prof. Michael Kearns
Networked Life NETS 112 Fall 2016 Prof. Michael Kearns
Models of Network Formation
Network Science: A Short Introduction i3 Workshop
Network Models Michael Goodrich Some slides adapted from:
Chapter 5: Sampling Distributions
Networked Life NETS 112 Fall 2019 Prof. Michael Kearns
Presentation transcript:

Universal Network Structure and Generative Models Networked Life CIS 112 Spring 2010 Prof. Michael Kearns

A Little Warm-Up… Consider yourself “connected” to anyone in class whose first name you know (assume symmetric) On the resulting network, let’s examine: The degree distribution The number and size of connected components The diameter The “clustering coefficient”

“Natural” Networks and Universality Consider the many kinds of networks we have examined: –social, technological, business, economic, content,… These networks tend to share certain informal properties: –large scale; continual growth –distributed, organic growth: vertices “decide” who to link to –interaction (largely) restricted to links –mixture of local and long-distance connections –abstract notions of distance: geographical, content, social,… Do natural networks share more quantitative universals? What would these “universals” be? How can we make them precise and measure them? How can we explain their universality? This is the domain of network science

Some Interesting Quantities Connected components: –how many, and how large? Network diameter: –the small-world phenomenon Clustering: –to what extent do links tend to cluster “locally”? –what is the balance between local and long-distance connections? –what roles do the two types of links play? Degree distribution: –what is the typical degree in the network? –what is the overall distribution? Etc. etc. etc.

A “Canonical” Natural Network has… Few connected components: –often only 1 or a small number (compared to network size) Small diameter: –often a constant independent of network size (like 6…) –or perhaps growing only very slowly with network size –typically look at average; exclude infinite distances A high degree of edge clustering: –considerably more so than for a random network –in tension with small diameter A heavy-tailed degree distribution: –a small but reliable number of high-degree vertices –quantifies Gladwell’s connectors –often of power law form

Some Models of Network Formation Random graphs (Erdos-Renyi model): –gives few components and small diameter –does not give high clustering or heavy-tailed degree distributions –is the mathematically most well-studied and understood model Watts-Strogatz and related models: –give few components, small diameter and high clustering –does not give heavy-tailed degree distributions Preferential attachment: –gives few components, small diameter and heavy-tailed distribution –does not give high clustering Hierarchical networks: –few components, small diameter, high clustering, heavy-tailed Affiliation networks: –models group-actor formation Nothing “magic” about any of the measures or models

Approximate Roadmap Examine a series of models of network formation –macroscopic properties they do and do not entail –tipping behavior during network formation –pros and cons of each model Examine some “real life” case studies Study some dynamics issues (e.g. seach/navigation) Move on to an in-depth study of the web as network

Models of Network Formation and Their Properties

Probabilistic Models of Networks Network formation models we will study are probabilistic or statistical –later in the course: economic formation models They can generate networks of any size –we will typically ask what happens when N is very large or N  infinity They often have various parameters that can be set/chosen: –size of network generated –probability of an edge being present or absent –fraction of long-distance vs. local connections –etc. etc. etc. The models each generate a distribution over networks Statements are always statistical in nature: –with high probability, diameter is small –on average, degree distribution has heavy tail

Optional Background on Probability and Statistics [Next three slides.]

Probability and Random Variables A random variable X is simply a variable that probabilistically assumes values in some set –set of possible values sometimes called the sample space S of X –sample space may be small and simple, or large and complex S = {Heads, Tails}; X is outcome of a coin flip S = {0,1,…,U.S. population size}; X is number voting democratic S = all networks of size N; X is generated by Erdos-Renyi Behavior of X determined by its distribution (or density) –for each specific value x in S, specify Pr[X = x] –these probabilities sum to exactly 1 (mutually exclusive outcomes) –complex sample spaces (such as large networks): distribution often defined implicitly by simpler components might specify the probability that each edge appears independently this induces a probability distribution over networks may be difficult to compute induced distribution

Some Basic Notions and Laws Independence: –let X and Y be random variables –independence: for any x and y, Pr[X=x & Y=y] = Pr[X=x]Pr[Y=y] –intuition: value of X does not “influence” value of Y, and vice-versa –dependence: e.g. X, Y coin flips, but Y is always opposite of X Expected (mean) value of X: –only makes sense for numeric random variables –“average” value of X according to its distribution –formally, E[X] =  (Pr[X = x] *x), sum is over all x in S –often denoted by  –always true: E[X + Y] = E[X] + E[Y] –for independent random variables: E[XY] = E[X]E[Y] Variance of X: –Var(X) = E[(X –  )^2]; often denoted by  ^2 –standard deviation is sqrt(Var(X)) = 

Convergence to Expectations Let X1, X2,…, Xn be: –independent random variables –with the same distribution Pr[X=x] –expectation  = E[X] and variance  ^2 –independent and identically distributed (i.i.d.) –essentially n repeated “trials” of the same experiment –natural to examine r.v. Z = (1/n)  Xi, where sum is over i=1,…,n –example: number of heads in a sequence of coin flips –example: degree of a vertex in the random graph model –E[Z] = E[X]; what can we say about the distribution of Z? Central Limit Theorem: –as n becomes large, Z becomes normally distributed with expectation  and variance  ^2/n –here’s a demodemo

The Erdos-Renyi Model

The Erdos-Renyi (E-R) Model (Random Networks) A model in which all edges: –are equally probable and appear independently Two parameters: NW size N > 1 and edge probability p: –each edge (u,v) appears with probability p, is absent with probability 1-p –N(N-1)/2 independent trials of a biased coin flip –results in a probability distribution over networks of size N –especially easy to generate networks from this distribution About the simplest (dumbest?) imaginable formation model The usual regime of interest is when p ~ 1/N, N is large –e.g. p = 1/2N, p = 1/N, p = 2/N, p=150/N, p = log(N)/N, etc. –in expectation, each vertex will have a “small” number of neighbors (~ pN) Gladwell’s “Magic Number 150” and cognitive bounds on degree mathematical interest: just near the boundary of connectivity –will then examine what happens when N  infinity –can thus study properties of large networks with bounded degree Degree distribution of a typical E-R network G: –draw G according to E-R with N, p; look at a random vertex u in G –what is Pr[deg(u) = k] for any fixed k? (or histogram of degrees) –Poisson distribution with mean = p(N-1) ~ pN –Sharply concentrated; not heavy-tailed

The Poisson Distribution The Poisson distribution: –often used to model counts of events number of phone calls placed in a given time period number of times a neuron fires in a given time period –single free parameter –probability of exactly x events: exp(- ) ^x/x! mean and variance are both here are some examples; again compare to heavy tailsexamplesheavy tails –similar to a normal (bell-shaped) distribution, but only takes on positive, integer values

Another Version of Erdos-Renyi In Erdos-Renyi: –expected number of edges in the network = pN(N-1)/2 = m –actual number of edges will be ”extremely close” to m –so suppose we instead of fixing p, we fix the number of edges m Incremental Erdos-Renyi model: –start with N vertices and no edges –at each time step, add a new edge, up to m edges total –choose new edge randomly from among all missing edges Allows study of the evolution or emergence of properties: –as the number of edges m grows (in relation to N) –equivalently, as p is increased (in relation to N) –let’s look at an Erdos-Renyi demodemo For our purposes, these models are equivalent under pN(N-1)/2 = m

The Evolution of a Random Network We have a large number N of vertices We start randomly adding edges one at a time (or increasing p) At what point will the network: –have at least one “large” connected component? –have a single connected component? –have “small” diameter? –have a “large” clique? How gradually or suddenly do these properties appear?

Monotone Network Properties Often interested in monotone network properties: –suppose G has the property (e.g. G is connected) –now add edges to G to obtain G’ –then G’ must have the property also (e.g. G’ is connected) Examples: –G is connected –G has diameter <= d (not exactly d) –G has a clique of size >= k (not exactly k) Interesting/nontrivial monotone properties: –G has no edges  G does not have the property –G has all edges (complete)  G has the property –so we know as p goes from 0 or 1, property emerges

Formalizing Tipping for Monotone Properties Consider the standard Erdos-Renyi model –each edge appears with probability p, absent with probability 1-p Pick a monotone property P of networks (e.g. being connected) Say that P has a tipping point at q if: –when p < q, probability network obeys P is ~ 0 –when p > q probability network obeys P is ~ 1 Aside to math weenies: –formalize by asking that probabilities converge to 0 or 1 as N  infinity Incremental E-R version: –replace q by “tipping” number of edges A purely structural definition of tipping –tipping results from incremental increase in connectivity No obvious reason any given property should tip

So… Which Properties Tip? The following properties all have tipping points: –having a “giant component” –being connected –having “small” diameter –in fact… 1996: All monotone network properties have tipping points! –So at least in one setting, tipping is the rule, not the exception

More Precise… Connected component of size > N/2: –tipping point p ~ 1/N –note: full connectivity virtually impossible Fully connected: –tipping point is p ~ log(N)/N –NW remains extremely sparse: only ~ log(N) edges per vertex Small diameter: –tipping point is p ~ 2/sqrt(N) for diameter 2 –fraction of possible edges still ~ 2/sqrt(N)  0 –generates very small worlds Upshot: right around/beyond p ~ 1/N, lots suddenly happens

Erdos-Renyi Summary A model in which all connections are equally likely –each of the N(N-1)/2 edges chosen randomly & independently As we add edges, a precise sequence of events unfolds: –network acquires a giant component –network becomes connected –network acquires small diameter –etc. etc. etc. Properties appear very suddenly (tipping, thresholds) –… and this is the rule, not the exception! All statements are mathematically precise All happen shortly around/after edge density p ~ 1/N –very efficient use of edges! But… is this how natural networks form? If not, which aspects are unrealistic? –maybe all edges are not equally likely…

The Clustering Coefficient of a Network The clustering coefficient of u: –let k = degree of u = number of neighbors of u –k(k-1)/2 = max possible # of edges between neighbors of u –c(u) = (actual # of edges between neighbors of u)/[k(k-1)/2] –0 <= c(u) <= 1; measure of cliquishness of u’s neighborhood Clustering coefficient of a graph: –average of c(u) over all vertices u k = 4 k(k-1)/2 = 6 c(u) = 4/6 = 0.666… u

Erdos-Renyi: Clustering Coefficient Generate a network G according to Erdos Renyi with N, p Examine a “typical” vertex u in G –choose u at random among all vertices in G –what do we expect c(u) to be? Answer: exactly p! In E-R, typical c(u) entirely determined by overall density Baseline for comparison with “more clustered” models –Erdos-Renyi has no bias towards clustered or local edges Clustering coefficient meaningless in isolation Must compare to the “background rate” of connectivity

“Caveman and Solaria” Erdos-Renyi: –sharing a common neighbor makes two vertices no more likely to be directly connected than two very “distant” vertices –every edge appears entirely independently of existing structure But in many settings, the opposite is true: –you tend to meet new friends through your old friends –two web pages pointing to a third might share a topic –two companies selling goods to a third are in related industries –one form of homophily Watts’ Caveman world: –overall density of edges is low –but two vertices with a common neighbor are likely connected Watts’ Solaria world –overall density of edges low; no special bias towards local edges –“like” Erdos-Renyi

Making it More Precise: the  -model –An incremental formation model –Pick network size N –Throw down a few random “seed” edges –Then for each pair of vertices u and v: compute probability of adding edge between u and v probability will depend on current network structure the more common neighbors u and v have, more likely to add edge provide knobs that let us adjust how weak/strong the effect is

larger  smaller   = 1 y = probability of connecting u & v x = number of current common neighbors of u & v 1.0 “default” probability p network size N Making it More Precise: the  -model = p + (1-p)*(x/N)^ 

Small Worlds and Occam’s Razor For small , should generate large clustering coefficients –after all, we “programmed” the model to do so! But we do not want a new model for every little property –Erdos-Renyi  small diameter –  -model  high clustering coefficient –etc. etc. etc. In the interests of Occam’s Razor, we would like to find –a single, simple model of network generation… –… that simultaneously captures many properties Watt’s small world: small diameter and high clustering –here is a figure showing that this can be captured in the  -modelfigure

An Alternative Model The  -model programmed high clustering into the formation process –and then we got small diamter “for free” (at certain  ) A different model: –start with all vertices arranged on a ring or cycle –connect each vertex to all others that are within k steps –with probability p, rewire each local connection to a random vertex Initial cyclical structure models “local” or “geographic” connectivity Long-distance rewiring models “long-distance” connectivity p=0: high clustering, high diameter p=1: low clustering, low diameter (E-R) In between: look at this simulationsimulation Which of these models do you prefer? –sociology vs. math

Meanwhile, Back in the “Real” World… Watts examines three real networks as case studies: –the Kevin Bacon graph –the Western states power grid –the C. elegans nervous system For each of these networks, he: –computes its size, diameter, and clustering coefficient –compares diameter and clustering to best Erdos-Renyi approx. –shows that the best  -model approximation is better –important to be “fair” to each model by finding best fit Overall moral: –if we care only about diameter and clustering,  is better than E-R

Case 1: Kevin Bacon Graph Vertices: actors and actresses Edge between u and v if they appeared in a film together Here is the datadata

Case 2: Western States Power Grid Vertices: power stations in Western U.S. Edges: high-voltage power transmission lines Here is the network and datanetwork and data

Case 3: C. Elegans Nervous System Vertices: neurons in the C. elegans worm Edges: axons/synapses between neurons Here is the network and datanetwork and data

Two More Examples M. Newman on scientific collaboration networks –coauthorship networks in several distinct communities –differences in degrees (papers per author) –empirical verification of giant components small diameter (mean distance) high clustering coefficient Alberich et al. on the Marvel Universe –purely fictional social network –two characters linked if they appeared together in an issue –“empirical” verification of heavy-tailed distribution of degrees (issues and characters) giant component rather small clustering coefficient