An introduction to exponential random graph models (ERGM)

Slides:



Advertisements
Similar presentations
Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen The Social Statistics.
Advertisements

Statistical Social Network Analysis - Stochastic Actor Oriented Models Johan Koskinen The Social Statistics Discipline Area, School of Social Sciences.
Social Network Analysis - Part I basics Johan Koskinen Workshop: Monday, 29 August 2011 The Social Statistics Discipline Area, School of Social Sciences.
Network analysis Sushmita Roy BMI/CS 576
Complex Networks Luis Miguel Varela COST meeting, Lisbon March 27 th 2013.
Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale Network Theory: Computational Phenomena and Processes Social Network.
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Where we are Node level metrics Group level metrics Visualization
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
The Statistical Analysis of the Dynamics of Networks and Behaviour. An Introduction to the Actor-based Approach. Christian Steglich and Tom Snijders ——————
Analysis and Modeling of Social Networks Foudalis Ilias.
Quantitative Network Analysis: Perspectives on mapping change in world system globalization Douglas White Robert Hanneman.
What is Statistical Modeling
Statistical inference for epidemics on networks PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham) Sep 2011 ICMS, Edinburgh.
Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.
Link creation and profile alignment in the aNobii social network Luca Maria Aiello et al. Social Computing Feb 2014 Hyewon Lim.
Exponential random graph (p*) models for social networks Workshop Harvard University February 2002 Philippa Pattison Garry Robins Department of Psychology.
Correlation and Autocorrelation
The formation of coalitions in a new market: The case of Socially Responsible Investments Elise Penalva Icher, University of Lille Filip Agneessens, Ghent.
1 Image Completion using Global Optimization Presented by Tingfan Wu.
Joint social selection and social influence models for networks: The interplay of ties and attributes. Garry Robins Michael Johnston University of Melbourne,
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Exponential Random Graph Models (ERGM) Michael Beckman PAD777 April 9, 2010.
Sunbelt 2009statnet Development Team ERGM introduction 1 Exponential Random Graph Models Statnet Development Team Mark Handcock (UW) Martina.
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
Actor Centrality Correlates to Project based Coordination Liaquat Hossain, Ph.D. Knowledge Management Research Lab
Social network analysis in business and economics Marko Pahor.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Spatial Statistics Applied to point data.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Composition of environmental decision-making networks – a case study in Helsinki, Finland M.Soc.Sc. Arho Toikka HERC Seminar Series Department.
Presentation: Random Walk Betweenness, J. Govorčin Laboratory for Data Technologies, Faculty of Information Studies, Novo mesto – September 22, 2011 Random.
Dan Ao Department of Sociology The Chinese University of Hong Kong May 30 th, 2008.
Lecture 8: Generalized Linear Models for Longitudinal Data.
ECON 3790 Statistics for Business and Economics
Social Network Analysis and Complex Systems Science
Principles of Social Network Analysis. Definition of Social Networks “A social network is a set of actors that may have relationships with one another”
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
Neighbourhood-based models for social networks: model specification issues Pip Pattison, University of Melbourne [with Garry Robins, University of Melbourne.
Susan O’Shea The Mitchell Centre for Social Network Analysis CCSR/Social Statistics, University of Manchester
A two minute introduction to: Exponential random graph (p*)models for social networks SNAC Workshop, Illinois, November 2005 Garry Robins, University of.
Complex Network Theory – An Introduction Niloy Ganguly.
Workshop on Optimization in Complex Networks, CNLS, LANL (19-22 June 2006) Application of replica method to scale-free networks: Spectral density and spin-glass.
Hierarchy Overview Background: Hierarchy surrounds us: what is it? Micro foundations of social stratification Ivan Chase: Structure from process Action.
Complex Network Theory – An Introduction Niloy Ganguly.
Introduction to Statistical Models for longitudinal network data Stochastic actor-based models Kayo Fujimoto, Ph.D.
Introduction to Matrices and Statistics in SNA Laura L. Hansen Department of Sociology UMB SNA Workshop July 31, 2008 (SOURCE: Introduction to Social Network.
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci.
Analysis of Social Media MLD , LTI William Cohen
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Introduction to ERGM/p* model Kayo Fujimoto, Ph.D. Based on presentation slides by Nosh Contractor and Mengxiao Zhu.
Ing. Martina Majorová, FEM SUA Statistics Lecture 4 – Data sampling & Data sorting.
Structures of Networks
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
Groups of vertices and Core-periphery structure
Exponential random graph models for multilevel networks
Are people associating based on gender similarity?
A p* primer: logit models for social networks
Social Balance & Transitivity
Discrete Kernels.
CSE 4705 Artificial Intelligence
Network Science: A Short Introduction i3 Workshop
Statistical Analysis of Social Networks
Assortativity (people associate based on common attributes)
Topic models for corpora and for graphs
Topic models for corpora and for graphs
Longitudinal Social Network Data
Presentation transcript:

An introduction to exponential random graph models (ERGM) The Social Statistics Discipline Area, School of Social Sciences An introduction to exponential random graph models (ERGM) Johan Koskinen Mitchell Centre for Network Analysis Workshop: Monday,  29 August 2011

Part 1a Why an ERGM

Networks matter – ERGMS matter FEMALE Male 6018 grade 6 children 1966

Networks matter – ERGMS matter 6018 grade 6 children 1966 – 300 schools Stockholm

Networks matter – ERGMS matter 6018 grade 6 children 1966 – 200 schools Stockholm Koskinen and Stenberg (in press) JEBS

Networks matter – ERGMS matter 6018 grade 6 children 1966 – 200 schools Stockholm Koskinen and Stenberg (in press) JEBS

Networks matter – ERGMS matter 6018 grade 6 children 1966 – 200 schools Stockholm Koskinen and Stenberg (in press) JEBS

Networks matter – ERGMS matter 6018 grade 6 children 1966 – 200 schools Stockholm Koskinen and Stenberg (in press) JEBS

Part 1b Modelling graphs

Notational preliminaries Why and what is an ERGM Dependencies Social Networks Notational preliminaries Why and what is an ERGM Dependencies Estimation Geyer-Thompson Robins-Monro Bayes (The issue, Moller et al, LISA, exchange algorithm) Interpretation of effects Convergence and GOF Further issues

Numerous recent substantively driven studies Networks matter – ERGMS matter Numerous recent substantively driven studies Gondal, The local and global structure of knowledge production in an emergent research field, SOCNET 2011 Lomi and Palotti, Relational collaboration among spatial multipoint competitors , SOCNET 2011 Wimmer &  Lewis, Beyond and Below Racial Homophily, AJS 2010 Lusher, Masculinity, educational achievement and social status, GENDER & EDUCATION 2011 Rank et al. (2010). Structural logic of intra-organizational networks, ORG SCI, 2010. Book for applied researchers: Lusher, Koskinen, Robins ERGMs for SN, CUP, 2011

The exponential random graph model (p*) framework An ERGM (p*) model is a statistical model for the ties in a network Independent (pairs of) ties (p1, Holland and Leinhardt, 1981; Fienberg and Wasserman, 1979, 1981) Markov graphs (Frank and Strauss, 1986) Extensions (Pattison & Wasserman, 1999; Robins, Pattison & Wasserman, 1999; Wasserman & Pattison, 1996) New specifications (Snijders et al., 2006; Hunter & Handcock, 2006)

ERGMS – modelling graphs

ERGMS – modelling graphs We want to model tie variables But structure – overall pattern – is evident What kind of structural elements can we incorporate in the model for the tie variables?

the number of ties in our model ERGMS – modelling graphs If we believe that the frequency of interaction/density is an important aspect of the network i l j k We should include Counts of the number of ties in our model

the number of mutual ties in our model ERGMS – modelling graphs If we believe that the reciprocity is an important aspect of the (directed) network i l j k We should include Counts of the number of mutual ties in our model

ERGMS – modelling graphs If we believe that an important aspect of the network is that k two edge indicators {i,j} and {i’,k} are conditionally dependent if {i,j}  {i’,k}  i j We should include counts of degree distribution; preferential attachment, etc friends meet through friends; clustering; etc

ERGMS – modelling graphs If we believe that the attributes of the actors are important (selection effects, homophily, etc) We should include counts of Heterophily/homophily Distance/similarity

ERGMS – modelling graphs If we believe that (Snijders, et al., 2006) i k two edge indicators {i,k} and {l,j} are conditionally dependent if {i,l} , {l,j} E l j clustered regions

 +  2 = ERGMS – modelling graphs Pr given the rest log Pr k log Pr given the rest i k j k m n  1  # +2  # +3  # + +  # i l i +  2 = adding edge, e.g.: k j k

ERGMS – modelling graphs The conditional formulation Pr given the rest i k log Pr given the rest i k where Is the difference in counts of structure type k May be ”aggregated” for all dyads so that the model for the entire adjacency matrix can be written

ERGMS – modelling graphs For a model with edges and triangles The model for the adjacency matrix X is a weighted sum where The parameters 1 and  weight the relative importance of ties and triangles, respectively - graphs with many triangles but not too dense are more probable than dense graphs with few triangles

ERGMS – modelling graphs: example Padgett’s Florentine families (Padgett and Ansell, 1993) network of 8 employees in a business company BusyNet <- as.matrix(read.table( ”PADGB.txt”,header=FALSE))

ERGMS – modelling graphs: example BusyNetNet <- network(BusyNet, directed=FALSE) plot(BusyNetNet) Requires libraries 'sna’,'network’

ERGMS – modelling graphs: example Observed and Density:

ERGMS – modelling graphs: example For a model with only edges Equivalent with the model: For each pair flip a p-coin heads tails Where p is the probability coin comes up heads

ERGMS – modelling graphs: example The (Maximul likelihood) estimate of p is the density here: i.e., an estimated one in 8 pairs establish a tie For an ERGM model with edges and hence

ERGMS – modelling graphs: example Solving for 1, we have that the density parameter and for the MLE Let’s check in stanet

approx. standard error of MLE of 1, ERGMS – modelling graphs: example Estim1 <- ergm(BusyNetNet ~ edges) summary(Estim1) approx. standard error of MLE of 1, Parameter corresponding to Success:

ERGMS – modelling graphs: example Do we believe in the model: For each pair flip a p-coin heads tails Let’s fit a model that takes Markov dependencies into account where statnet

approx. standard error of MLE of , ERGMS – modelling graphs: example Estim2 <- ergm(BusyNetNet ~ kstar(1:3) + triangles) summary(Estim2) approx. standard error of MLE of ,

Part 3 Estimation

Sum over all 2n(n-1)/2 graphs Likelihood equations for exponential fam ”Aggregated” to a joint model for entire adjacency matrix X Sum over all 2n(n-1)/2 graphs The MLE solves the equation (cf. Lehmann, 1983):

Likelihood equations for exponential fam Solving Using the cumulant generating function (Corander, Dahmström, and Dahmström, 1998) Stochastic approximation (Snijders, 2002, based on Robbins-Monro, 1951) Importance sampling (Handcock, 2003; Hunter and Handcock, 2006, based on Geyer-Thompson 1992)

- Initialisation phase - Main estimation Robbins-Monro algorithm Solving Snijders, 2002, algorithm - Initialisation phase - Main estimation - convergence check and cal. of standard errors MAIN: Draw using MCMC

Robbins-Monro algorithm Parameter space Network space Observed data In this case, the distribution of graphs from the parameter values is not consistent with the observed data A set of parameter values We can simulate the distribution of graphs associated with any given set of parameters.

Robbins-Monro algorithm Parameter space Network space We want to find the parameter estimates for which the observed graph is central in the distribution of graphs

Robbins-Monro algorithm Parameter space Network space Observed data Based on this discrepancy, adjust parameter values So: 1. compare distribution of graphs with observed data 2. adjust parameter values slightly to get the distribution closer to the observed data 3. keep doing this until the estimates converge to the right point

Robbins-Monro algorithm Parameter space Network space Over many iterations we walk through the parameter space until we find the right estimates

Phase 1, Initialisation phase Robbins-Monro algorithm Phase 1, Initialisation phase Find good values of the initial parameter state And the scaling matrix (use the score-based method, Schweinberger & Snijders, 2006)

Phase 2, Main estimation phase Iteratively update θ Robbins-Monro algorithm Phase 2, Main estimation phase Iteratively update θ by drawing one realisation from the model defined by the current θ(m) Repeated in sub-phases with fixed

Phase 2, Main estimation phase Robbins-Monro algorithm Phase 2, Main estimation phase Relies on us being able to draw one realisation from the ERGM defined by the current θ We can NOT do this directly We have to simulate x More specifically use Markov chain Monte Carlo

What do we need to know about MCMC? Method: Robbins-Monro algorithm What do we need to know about MCMC? Method: Generate a sequence of graphs for arbitrary x(0), using an updating rule… so that as

What do we need to know about MCMC? Robbins-Monro algorithm What do we need to know about MCMC? So if we generate an infinite number of graphs in the “right” way we have the ONE draw we need to update θ once? Typically we can’t wait an infinite amount of time so we settle for multiplication factor In Pnet very large is

Phase 3, Convergence check and calculating standard errors Robbins-Monro algorithm Phase 3, Convergence check and calculating standard errors At the end of phase 2 we always get a value But is it the MLE? Does it satisfy ?

Phase 3, Convergence check and calculating standard errors Robbins-Monro algorithm Phase 3, Convergence check and calculating standard errors Phase 3 simulates a large number of graphs And checks if A minor discrepancy - due to numerical inaccuracy - is acceptable Convergence statistics:

Approximated using importance sample from MCMC Geyer-Thompson Solving Handcock, 2003, approximate Fisher scoring MAIN: Approximated using importance sample from MCMC

Sum over all 2n(n-1)/2 graphs Bayes: dealing with likelihood The normalising constant of the posterior not essential for Bayesian inference, all we need is: … but Sum over all 2n(n-1)/2 graphs

Bayes: MCMC? Consequently, in e.g. Metropolis-Hastings, acceptance probability of move to θ … which contains

Bayes: Linked Importance Sampler Auxiliary Variable MCMC LISA (Koskinen, 2008; Koskinen, Robins & Pattison, 2010): Based on Møller et al. (2006), we define an auxiliary variable  And produce draws from the joint posterior using the proposal distributions and

Bayes: alternative auxiliary variable LISA (Koskinen, 2008; Koskinen, Robins & Pattison, 2010): Based on Møller et al. (2006), we define an auxiliary variable  Many linked chains: Computation time storage (memory and time issues) Improvement: use exchange algorithm (Murray et al. 2006) and Accept θ* with log-probability: Caimo & Friel, 2011

Storing only parameters Bayes: Implications of using alternative auxiliary variable Storing only parameters No pre tuning – no need for good initial values Standard MCMC properties of sampler Less sensitive to near degeneracy in estimation Easier than anything else to implement QUICK and ROBUST Improvement: use exchange algorithm (Murray et al. 2006) and Accept θ* with log-probability: Caimo & Friel, 2011

Interpretation of effects Part 4 Interpretation of effects

Markov dependence assumption: Problem with Markov models Markov dependence assumption: k two edge indicators {i,j} and {i’,k} are conditionally dependent if {i,j}  {i’,k}  i j We have shown that the only effects are: degree distribution; preferential attachment, etc friends meet through friends; clustering; etc

hard Often for Markov model Matching or Problem with Markov models degree distribution; preferential attachment, etc friends meet through friends; clustering; etc Matching Impossible! hard or

hard Matching or Some statistic k or Problem with Markov models Impossible! hard or # triangles Some statistic k or # edges

hard Matching Problem with Markov models # triangles The number of triangles for a Markov model with edges (-3), 2-stars (.5), 3-stars (-.2) and triangles (1.1524) # triangles

hard Matching Problem with Markov models Exp. # triangles and edges for a Markov model with edges, 2-stars, 3-stars and triangles

Matching If for some statistic k Implies: Problem with Markov models Impossible! Matching If for some statistic k Implies: Similarly for max (or conditional min/max) e.g. A graph on 7 vertices with 5 edges # 2-stars # edges

First solutions (Snijders et al., 2006) Problem with Markov models First solutions (Snijders et al., 2006) Markov: adding k-star adds (k-1) stars alternating sign compensates but eventually zk(x)=0 Alternating stars: Restriction on star parameters - Alternating sign prevents explosion, and models degree distribution

First solutions (Snijders et al., 2006) Problem with Markov models First solutions (Snijders et al., 2006) Markov: adding k-star adds (k-1) stars alternating sign compensates but eventually zk(x)=0

First solutions (Snijders et al., 2006) Problem with Markov models First solutions (Snijders et al., 2006) Include all stars but restrict parameter: new Alternating stars: Restriction on star parameters - Alternating sign prevents explosion, and models degree distribution

First solutions (Snijders et al., 2006) Problem with Markov models First solutions (Snijders et al., 2006) Include all stars but restrict parameters: new Expressed in terms of degree distribution

First solutions (Snijders et al., 2006) Problem with Markov models First solutions (Snijders et al., 2006) Include all stars but restrict parameters: Interpretation: Positive parameter (λ ≥ 1) – graphs with some high degree nodes and larger degree variance more likely than graphs with more homogenous degree distribution Negative parameter (λ ≥ 1) – the converse…

First solutions (Snijders et al., 2006) Problem with Markov models First solutions (Snijders et al., 2006) Markov: triangles evenly spread out but one edge can add many triangles… Alternating triangles: Restrictions on different order triangles – alternating sign Prevents explosion, and Models multiply clustered regions Social circuit dependence assumption

First solutions (Snijders et al., 2006) Problem with Markov models First solutions (Snijders et al., 2006) Markov: triangles evenly spread out but one edge can add many triangles… k-triangles: 1-triangles: 2-triangles: 3-triangles:

First solutions Problem with Markov models Weigh the k-triangles: Where: Alternating triangles: Restrictions on different order triangles – alternating sign Prevents explosion, and Models multiply clustered regions

First solutions Problem with Markov models Underlying assumption: Social circuit dependence assumption l j two edge indicators {i,k} and {l,j} are conditionally dependent if {i,l} , {l,j} E Alternating triangles: Restrictions on different order triangles – alternating sign Prevents explosion, and Models multiply clustered regions

Alternative interpretation Problem with Markov models Alternative interpretation We may (geometrically) weight together : i j

Alternative intrepretation Problem with Markov models Alternative intrepretation We may also define the Edgewise Shared Partner Statistic: … and we can weigh together the ESP statistics using Geometrically decreasing weights: GWESP i j

Example Lazega’s law firm partners Part 4b Example Lazega’s law firm partners

Boston office: Hartford office: Providence off.: Lazega’s (2001) Lawyers Bayesian Data Augmentation Collaboration network among 36 lawyers in a New England law firm (Lazega, 2001) Boston office: Hartford office: Providence off.: least senior: most senior:

Bayesian Data Augmentation Lazega’s (2001) Lawyers Bayesian Data Augmentation Fit a model with ”new specifications” and covariates Edges: Seniority: Practice: Homophily Sex: Office: GWESP: Main effect

PNet: Lazega’s (2001) Lawyers Bayesian Data Augmentation Fit a model with ”new specifications” and covariates PNet:

Bayesian Data Augmentation Lazega’s (2001) Lawyers Bayesian Data Augmentation Fit a model with ”new specifications” and covariates Main effect Edges: Seniority: Practice: Homophily Practice: Sex: Office: GWESP:

Interpreting attribute-related effects Part 4c Interpreting attribute-related effects

Fitting an ERGM in Pnet: a business communications network Bayesian Data Augmentation For ”wwbusiness.txt” we have recorded wheather the employee works in the central office or is a traveling sales represenative office sales Rb for attribute R for attribute

Fitting an ERGM in Pnet: a business communications network Bayesian Data Augmentation Consider a dyad-independent model office sales Rb for attribute R for attribute

Fitting an ERGM in Pnet: a business communications network Bayesian Data Augmentation With log odds office sales Rb for attribute R for attribute

Interpreting higher order effects Part 4d Interpreting higher order effects

Alternating stars a way of Unpacking the alternating star effect Alternating stars a way of “fixing” the Markov problems (models all degrees) Controlling for paths in clustering 1-triangles: 2-triangles: 3-triangles: k-triangles measure clustering…

Alternating stars a way of Unpacking the alternating star effect Alternating stars a way of “fixing” the Markov problems (models all degrees) Controlling for paths in clustering 1-triangles: 2-triangles: 3-triangles: Is it closure or an artefact of many stars/2-paths?

Interpreting the alternating star parameter: Unpacking the alternating star effect Interpreting the alternating star parameter: centralization – + variance of degree measure centralization

alternating star parameter Unpacking the alternating star effect Interpreting the alternating star parameter: centralization – + alternating star parameter – +

alternating star parameter Unpacking the alternating star effect Statistics for graph (n = 16); fixed density; alt trian: 1.17 alternating star parameter – +

alternating star parameter Unpacking the alternating star effect Statistics for graph (n = 16); fixed density; alt trian: 1.17 alternating star parameter – +

alternating star parameter Unpacking the alternating star effect Graphs (n = 16); fixed density; alt trian: 1.17 alternating star parameter – +

Note also the influence of isolates: Problem with Markov models Note also the influence of isolates: Alt, stars in terms of degree distribution

alternating star parameter Problem with Markov models Note also the influence of isolates: alternating star parameter – +

Note also the influence of isolates: Problem with Markov models Note also the influence of isolates: This is because we impose a particular shape on the degree distribution

alternating star parameter Problem with Markov models The degree distribution for different values of: – alternating star parameter +

Alternating triangles measure clustering Unpacking the alternating triangle effect Alternating triangles measure clustering We may also define the Edgewise Shared Partner Statistic: … and we can weigh together the ESP statistics using Geometrically decreasing weights: GWESP i j

alternating Triangle parameter Unpacking the alternating triangle effect Statistics for graph (n = 16); fixed density; alt star: 2.37 alternating Triangle parameter – +

alternating Triangle parameter Unpacking the alternating triangle effect Statistics for graph (n = 16); fixed density; alt star: 2.37 alternating Triangle parameter – +

alternating Triangle parameter Unpacking the alternating triangle effect Statistics for graph (n = 16); fixed density; alt star: 2.37 + closed open clustering alternating Triangle parameter – +

alternating Triangle parameter Unpacking the alternating triangle effect Statistics for graph (n = 16); fixed density; alt star: 2.37 + closed open clustering alternating Triangle parameter – +

Alternating triangles model multiply clustered areas Unpacking the alternating triangle effect Alternating triangles model multiply clustered areas For multiply clustered areas triangles stick together We model how many others tied actors have Edgewise Shared Partner Statistic: i j

Statistics for graph (n = 16); fixed density; alt star: 2.37 + Unpacking the alternating triangle effect Statistics for graph (n = 16); fixed density; alt star: 2.37 + Alternating triangle para # edwise shared partners – +

Convergence check and goodness of fit Part 5 Convergence check and goodness of fit

Revisiting the Florentine families statnet Why difference? PNet

Pnet checks convergence in 3rd phase Revisiting the Florentine families Pnet checks convergence in 3rd phase

Revisiting the Florentine families Lets check For statnet

Estim3 <- ergm(BusyNetNet ~ kstar(1:3) + triangles , verbose=TRUE) mcmc.diagnostics(Estim3) ?

Simulation, GOF, and Problems Given a specific model we can simulate potential outcomes under that model. This is used for Estimation: (a) to match observed statistics and expected statistics (b) to check that we have “the solution” GOF: to check whether the model can replicated features of the data that we not explicitly modelled Investigate behaviour of model, e.g.: degeneracy and dependence on scale

For fitted effects this is true by construction Simulation, GOF, and Problems Standard goodness of fit procedures are not valid for ERGMs – no F-tests or Chi-square tests available If indeed the fitted model adequately describes the data generating process, then the graphs that the model produces should be similar to observed data For fitted effects this is true by construction For effects/structural feature that are not fitted this may not be true If it is true, the modelled effects are the only effects necessary to produce data – a “proof” of the concept or ERGMs

Example: for our fitted model for Lazega Simulation, GOF, and Problems Example: for our fitted model for Lazega We can look at how well the model reproduces For arbitrary function g

From the goodness-of-fit tab in Pnet we get Simulation, GOF, and Problems From the goodness-of-fit tab in Pnet we get

Simulation, GOF, and Problems Effect MLE s.e. Density -6.501 0.727 Main effect seniority 1.594 0.324 Main effect practice 0.902 0.163 Homophily effect practice 0.879 0.231 Homophily sex 1.129 0.349 Homophily office 1.654 0.254

Simulation, GOF, and Problems

Simulation, GOF, and Problems Effect MLE s.e. Density -6.501 0.727 -6.510 0.637 Main effect seniority 1.594 0.324 0.855 0.235 Main effect practice 0.902 0.163 0.410 0.118 Homophily effect practice 0.879 0.231 0.759 0.194 Homophily sex 1.129 0.349 0.704 0.254 Homophily office 1.654 1.146 0.195 GWEPS 0.897 0.304 Log-lambda 0.778 0.215

Simulation, GOF, and Problems

Part 2 Dependencies

Independence - Deriving the ERGM heads i l i l tails j k i l i m n heads k tails i k

does not help us predict SEK Independence - Deriving the ERGM AUD 0.5 SEK 0.5 0.25 l i k l i k Knowledge of AUD, e.g. does not help us predict SEK e.g. whether or

does not help us predict SEK Independence - Deriving the ERGM Knowledge of AUD, e.g. does not help us predict SEK e.g. whether or even though dyad {i,l} l i i and dyad {i,k} k have vertex i i in common

Independence - Deriving the ERGM May we find model such that knowledge of AUD, e.g. does help us predict SEK e.g. whether or ? AUD 0.5 i l SEK 0.5 0.4 0.1 k i l k

Consider the tie-variables that have Mary in common Deriving the ERGM: From Markov graph to Dependence graph Consider the tie-variables that have Mary in common paul mary john pete How may we make these “dependent”?

Deriving the ERGM: From Markov graph to Dependence graph paul mary mary pete john pete

Deriving the ERGM: From Markov graph to Dependence graph paul mary mary pete mary john john pete

Deriving the ERGM: From Markov graph to Dependence graph paul mary paul mary mary pete mary john john pete

Deriving the ERGM: From Markov graph to Dependence graph paul mary paul mary mary pete mary john john pete pete john

Deriving the ERGM: From Markov graph to Dependence graph paul mary paul paul john mary mary pete mary john john pete pete john

Deriving the ERGM: From Markov graph to Dependence graph paul mary paul paul paul john pete mary mary pete mary john john pete pete john

Deriving the ERGM: From Markov graph to Dependence graph The “probability structure” of a Markov graph is described by cliques of the dependence graph (Hammersley-Clifford)…. paul m,pa pa,pe pa,j mary m,pe m,j john pe,j pete

Deriving the ERGM: From Markov graph to Dependence graph paul m,pa pa,pe pa,j mary m,pe m,j pe,j pete

Deriving the ERGM: From Markov graph to Dependence graph paul mary m,pe john pete

Deriving the ERGM: From Markov graph to Dependence graph paul m,pa pa,pe pa,j mary m,pe m,j john pe,j pete

Deriving the ERGM: From Markov graph to Dependence graph paul m,pa pa,pe pa,j mary m,pe m,j john pe,j pete

Deriving the ERGM: From Markov graph to Dependence graph paul mary m,pe m,j john pete

Deriving the ERGM: From Markov graph to Dependence graph paul m,pa pa,pe pa,j mary m,pe m,j john pe,j pete

Deriving the ERGM: From Markov graph to Dependence graph paul m,pa pa,pe pa,j mary m,pe m,j john pe,j pete

Deriving the ERGM: From Markov graph to Dependence graph paul mary m,pe m,j john pe,j pete

Deriving the ERGM: From Markov graph to Dependence graph paul m,pa pa,pe pa,j mary m,pe m,j john pe,j pete

Deriving the ERGM: From Markov graph to Dependence graph paul m,pa pa,pe pa,j mary m,pe m,j john pe,j pete

Deriving the ERGM: From Markov graph to Dependence graph paul m,pa pa,j mary m,pe john pe,j pete

Deriving the ERGM: From Markov graph to Dependence graph paul m,pa pa,pe pa,j mary m,pe m,j john pe,j pete

too many statistics (parameters) From Markov graph to Dependence graph – distinct subgraphs? too many statistics (parameters)

The homogeneity assumption = = = =

Interaction terms in log-linear model of types A log-linear model (ERGM) for ties ”Aggregated” to a joint model for entire adjacency matrix Interaction terms in log-linear model of types

More than is explained by margins A log-linear model (ERGM) for ties By definition of (in-) dependence More than is explained by margins E.g. and co-occuring i j k Main effects interaction term

Summary of fitting routine Part 7 Summary of fitting routine

rerun if model not converged include more parameters? GOF The steps of fitting an ERGM fit base-line model check convergence rerun if model not converged include more parameters? GOF candidate models

Part 6 Bipartite data

Bi-partite networks: cocitation (Small 1973) 2 3 4 5 7 6 texts cite

Bi-partite networks: cooffending (. Sarnecki, 2001) offences offenders participating

Bi-partite networks people people directors participating in Social events (Breiger, 1974) people belonging to voluntary organisations (Bonachich, 1978) sitting on corporate boards (eg. Mizruchi, 1982 directors

Tie: If two directors share a board One-mode projection Two-mode Tie: If two directors share a board one-mode

Bi-partite networks texts cite 2 2 2 2 2 2 3 3 3 3 6 7 1 1 1 1 1 1 4 4 5 texts cite

Bi-partite networks Researchers texts cite

Bi-partite networks Researchers texts cite

One-mode projection What mode given priority? (Duality of social actors and social groups; e.g. Breiger 1974; Breiger and Pattison, 1986) Loss of information

ERGM for bipartite networks The model is the same as for one-mode networks (Wang et al., 2007) i j i j i j i j k ℓ k ℓ k ℓ k ℓ Edges ”people” 2-stars ”affiliation” 2-stars 3-paths 4-cycles

ERGM for bipartite networks j A form of bipartite clustering Also known as Bi-cliques One possible interpretation: k ℓ 4-cycles

ERGM for bipartite networks j A form of bipartite clustering Also known as Bi-cliques One possible interpretation: k ℓ Who to appoint? 4-cycles x How about x?

ERGM for bipartite networks Fitting the model in (B)Pnet straightforward extension These statistics are all Markov: i j i j i j i j k ℓ k ℓ k ℓ k ℓ Edges ”people” 2-stars ”affiliation” 2-stars 3-paths 4-cycles <BPNet>

Part 7 Missing data

Methods for missing network data Effects of missingness Perils Some investigations on the effects on indices of structural properties (Kossinets, 2006; Costenbader & Valente, 2003; Huisman, 2007) Problems with the “boundary specification issue” Few remedies Deterministically “complement” data (Stork & Richards, 1992) Stochastically Impute missing data (Huisman, 2007) Ad-hoc “likelihood” (score) for missing ties (Liben-Nowell and Kleinberg, 2007)

If you don’t have a model for what you have observed Model assisted treatment of missing network data If you don’t have a model for what you have observed How are you going to be able to say something about what you have not observed using what you have observed missing data observed data

Bayesian data augmentation (Koskinen, Robins & Pattison, 2010) Model assisted treatment of missing network data Importance sampling (Handcock & Gile 2010; Koskinen, Robins & Pattison, 2010) Stochastic approximation and the missing data principle (Orchard & Woodbury,1972) (Koskinen & Snijders, forthcoming) Bayesian data augmentation (Koskinen, Robins & Pattison, 2010)

We should include counts of: Marginalisation (Snijders, 2010; Koskinen et al, 2010) Subgraph of ERGM not ERGM Dependence in ERGM But if k ? j We may also have dependence k k l i i j j We should include counts of:

Bayesian Data Augmentation Bayesian Data Augmentation Simulate parameters  With missing data: In each iteration simulate graphs  missing

Most likely missing given current  Bayesian Data Augmentation Bayesian Data Augmentation Simulate parameters  With missing data:  Most likely missing given current  missing In each iteration simulate graphs

Most likely  given current missing Bayesian Data Augmentation Bayesian Data Augmentation Simulate parameters  With missing data:  Most likely  given current missing missing In each iteration simulate graphs

Most likely missing given current  Bayesian Data Augmentation Bayesian Data Augmentation Simulate parameters  With missing data:  Most likely missing given current  missing In each iteration simulate graphs

Most likely  given current missing Bayesian Data Augmentation Bayesian Data Augmentation Simulate parameters  With missing data:  Most likely  given current missing missing In each iteration simulate graphs

Most likely missing given current  Bayesian Data Augmentation Bayesian Data Augmentation Simulate parameters  With missing data:  Most likely missing given current  missing In each iteration simulate graphs

and so on… Bayesian Data Augmentation Bayesian Data Augmentation Simulate parameters  With missing data:  and so on… missing In each iteration simulate graphs

… until Bayesian Data Augmentation Bayesian Data Augmentation Simulate parameters  With missing data: … until  missing In each iteration simulate graphs

Bayesian Data Augmentation Bayesian Data Augmentation What does it give us? Distribution of parameters Distribution of missing data Subtle point Missing data does not depend on the parameters (we don’t have to choose parameters to simulate missing)  missing

Boston office: Hartford office: Providence off.: Lazega’s (2001) Lawyers Bayesian Data Augmentation Collaboration network among 36 lawyers in a New England law firm (Lazega, 2001) Boston office: Hartford office: Providence off.: least senior: most senior:

Bayesian Data Augmentation Lazega’s (2001) Lawyers Bayesian Data Augmentation Edges: (bi = 1, if i corporate, 0 litigation) Main effect Seniority: t1 : Practice: Homophily Practice: t2 : Sex: Office: t3 : GWESP: with 8 = log() etc.

Lazega’s (2001) Lawyers – ERGM posteriors (Koskinen, 2008) Bayesian Data Augmentation

Remove 200 of the 630 dyads at random Cross validation (Koskinen, Robins & Pattison, 2010) Bayesian Data Augmentation Remove 200 of the 630 dyads at random Fit inhomogeneous Bernoulli model obtain the posterior predictive tie-probabilities for the missing tie-variables Fit ERGM and obtain the posterior predictive tie-probabilities for the missing tie-variables (Koskinen et al., in press) Fit Hoff’s (2008) latent variable probit model with linear predictor Tz(xij) + wiwjT Repeat many times

ROC curve for predictive probabilities combined over 20 replications (Koskinen et al. 2010) Bayesian Data Augmentation

ROC curve for predictive probabilities combined over 20 replications (Koskinen et al. 2010) Bayesian Data Augmentation

ROC curve for predictive probabilities combined over 20 replications (Koskinen et al. 2010) Bayesian Data Augmentation

... but snowball sampling rarely used when population size is known... Bayesian Data Augmentation Snowball sampling design ignorable for ERGM (Thompson and Frank, 2000, Handcock & Gile 2010; Koskinen, Robins & Pattison, 2010) ... but snowball sampling rarely used when population size is known... Using the Sageman (2004) “clandestine” network as test-bed for unknown N

Part 7 Spatial embedding

306 actors in Victoria, Australia Spatial embedding – Daraganova et al. 2011 SOCNET 306 actors in Victoria, Australia

306 actors in Victoria, Australia Spatial embedding – Daraganova et al. 2011 SOCNET 306 actors in Victoria, Australia spatially embedded ... all living within 14 kilometres of each other

306 actors in Victoria, Australia Spatial embedding – Daraganova et al. 2011 SOCNET 306 actors in Victoria, Australia Bernoulli conditional on distance Empirical probability ... all living within 14 kilometres of each other

ERGM: distance and endogenous dependence explain different things Spatial embedding – Daraganova et al. 2011 SOCNET Edges -4.87* (0.13) 1.56* (0.65) -4.79* (0.66) -0.20 (0.87) Alt. star -0.86* (0.18) -0.86* (0.2) Alt. triangel 2.74* (0.15) 2.69* (0.14) Log distance -0.78* (0.08) -0.56* (0.07) Age heterophily -0.07* (0.01) 0.001 (0.07) 0.002 (0.06) Gender homophily -1.13* (0.61) -1.13 (0.69) 0.09 (0.83) 0.07 (0.47) ERGM: distance and endogenous dependence explain different things

Part 8 Further issues

modelling actor autoregressive attributes Other extensions There are ERGMs (ERGM-like modesl) for directed data valued data bipartite data multiplex data longitudinal data modelling actor autoregressive attributes

ERGMs typically assume homogeneity Further issues – relaxing homogeneity assumption ERGMs typically assume homogeneity Block modelling and ERGM (Koskinen, 2009) Latent class ERGM (Schweingberger & Handcock)

Bayes factors (Wang & Handcock…?) Further issues – model selection Assessing Goodness of Fit: Posterior predictive distributions (Koskinen, 2008; Koskinen, Robins & Pattison, 2010; Caimo & Friel, 2011) Non-Bayesian heuristic GOF (Robins et al., 2007; Hunter et al., 2008; Robins et al., 2009; Wang et al., 2009) Model selection Path sampling for AIC (Hunter & Handcock, 2006); Conceptual caveat: model complexity when variables dependent? Bayes factors (Wang & Handcock…?)

Bayes is the way of the future Legitimacy and dissemination Wrap-up ERGMs Increasingly being used Increasingly being understood Increasingly being able to handle imperfect data (also missing link prediction) Methods Plenty of open issues Bayes is the way of the future Legitimacy and dissemination - e.g. Lusher, Koskinen, Robins ERGMs for SN, CUP, 2011

Remaining question: used to be p*… ... why ERGM?