Exponential Random Graph Models (ERGM) Michael Beckman PAD777 April 9, 2010.

Slides:

Advertisements

Similar presentations

Regression Eric Feigelson Lecture and R tutorial Arcetri Observatory April 2014.

Advertisements

An introduction to exponential random graph models (ERGM)

Where we are Node level metrics Group level metrics Visualization

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.

Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.

1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.

Chapter 4 Randomized Blocks, Latin Squares, and Related Designs

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.

Markov-Chain Monte Carlo

An Introduction to Variational Methods for Graphical Models.

Markov Chains 1.

Statistical inference for epidemics on networks PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham) Sep 2011 ICMS, Edinburgh.

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.

CHAPTER 16 MARKOV CHAIN MONTE CARLO

CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov

Exponential random graph (p*) models for social networks Workshop Harvard University February 2002 Philippa Pattison Garry Robins Department of Psychology.

Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.

Joint social selection and social influence models for networks: The interplay of ties and attributes. Garry Robins Michael Johnston University of Melbourne,

Machine Learning CUNY Graduate Center Lecture 7b: Sampling.

Definitions Uniform Distribution is a probability distribution in which the continuous random variable values are spread evenly over the range of possibilities;

Chapter 11 Multiple Regression.

Linear and generalised linear models

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

Chapter 2 Simple Comparative Experiments

Inferences About Process Quality

Sunbelt 2009statnet Development Team ERGM introduction 1 Exponential Random Graph Models Statnet Development Team Mark Handcock (UW) Martina.

Maximum likelihood (ML)

Lecture II-2: Probability Review

Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;

Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.

CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.

Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.

Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.

1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.

CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.

Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.

1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.

An Efficient Sequential Design for Sensitivity Experiments Yubin Tian School of Science, Beijing Institute of Technology.

Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.

Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.

Spatial Analysis & Geostatistics Methods of Interpolation Linear interpolation using an equation to compute z at any point on a triangle.

A two minute introduction to: Exponential random graph (p*)models for social networks SNAC Workshop, Illinois, November 2005 Garry Robins, University of.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:

Introduction to Statistical Models for longitudinal network data Stochastic actor-based models Kayo Fujimoto, Ph.D.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.

Tutorial I: Missing Value Analysis

6.4 Random Fields on Graphs 6.5 Random Fields Models In “Adaptive Cooperative Systems” Summarized by Ho-Sik Seok.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.

Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.

Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.

Computacion Inteligente Least-Square Methods for System Identification.

Introduction to ERGM/p* model Kayo Fujimoto, Ph.D. Based on presentation slides by Nosh Contractor and Mengxiao Zhu.

Markov random fields. The Markov property Discrete time: A time symmetric version: A more general version: Let A be a set of indices >k, B a set of indices.

BINARY LOGISTIC REGRESSION

Two-Sample Hypothesis Testing

Chapter 2 Simple Comparative Experiments

CJT 765: Structural Equation Modeling

Econ 3790: Business and Economics Statistics

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Parametric Methods Berlin Chen, 2005 References:

Markov Networks.

Presentation transcript:

Exponential Random Graph Models (ERGM) Michael Beckman PAD777 April 9, 2010

Introduction “The purpose of ERGM, in a nutshell, is to describe parsimoniously the local selection forces that shape the global structure of a network.” “ERGM may then be used to understand a particular phenomenon or to simulate new random realizations of networks that retain the essential properties of the original.” (Hunter et al 2008) General characteristics of ERGM Single observation rather than successive waves Change statistics compare observed network to random realizations Still computes Markov or Markov-like statistics Can model both structural and attribute parameters Assumptions and constraints are important to estimations Improved SE’s even where pseudolikelihood produces acceptable estimates Goodness of fit statistics are reliable Significant move towards true stochastic modeling of networks

Agenda Wasserman and Robins (2005) An Introduction to Random Graphs, Dependence Graphs, and p* Snijders ( 2002) Markov chain monte carlo estimation of ERGM Robins et al (2007) Recent developments in exponential random graph (p*) models for social networks Hunter et al (2008) A Package to Fit, Simulate and Diagnose Exponential- Family Models for Networks Morris et al (2008) Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects Andrew (2009) Regional integration through contracting networks

Wasserman & Robins - Intro Wasserman and Robins (2005) An Introduction to Random Graphs, Dependence Graphs, and p* Historic development of p* distribution for Markov random graphs Frank and Strauss 1986 Strauss and Ikeda 1990 (estimation of distribution parameters) Wasserman and Pattison 1996 (extend parameter assumptions) Wasserman and Robins 2005 – Family of models from dependence graphs Versus approximate autologistic regression (pseudo-likelihood) Standard network notation r=1- single relation, dichotomous data Random variables, assumed interdependent Can use multivariate or valued relations Dependence graphs allows testing for independent elements in matrix X

Wasserman & Robins - Intro Model parameters estimated from three new arrays; converse, composition, intersection of measured relations The complement relation has no tie coded from i to j - one can view this single variable as missing

Wasserman & Robins - Intro Consider the observed network as a subset of all possible configurations Dependence graphs help distinguish among possible distributions, by identifying ties that are statistically independent Dependence graph: graph of nodes whose edges signify pairs of random variables that are assumed to be conditionally dependent

Three classes of dependence graphs: Bernoulli – assumption of conditional independence for each pair of ties Empty graph, due to complete independence Conditional uniform distribution Dyadic dependence – assumes all dyads are statistically independent Dependence graph has edge set for each dyad Basis for p1 model of Holland and Leinhardt (1977,1981) General dependence graph – arbitrary edge set with general probability distribution – basis for p* Wasserman & Robins - Intro

Markov graphs and p* Any two relational ties associated if they involve same actor Observed network considered a realization x of random array X Dependence graph D consists of any complete subgraphs, or cliques Hammersley-Clifford theorem characterizes Pr(X=x) in the form of an exponential family of distributions Set of non-zero parameters depends on maximal cliques Wasserman & Robins - Intro

Estimating parameters can overwhelm the model, so constraints are needed Impose dependence assumptions on parameters Homogeneity – ie, isomorphic dyads (MAN) Higher-order configurations typically set to zero (stars, triads etc) Constrained social settings Exact differentiation of log likelihood is mathematically challenging Pseudolikelihood – measures of fit problematic MCMC – model degeneracy may be a problem MCMC is normally preferred, improved algorithms are available and/or being developed Wasserman & Robins - Intro

Snijders – MCMC Estimation Snijders ( 2002) Markov chain Monte Carlo Estimation of ERGM Random graph is a Markov graph if number of nodes is fixed, and non- incident edges are independent conditional upon rest of graph Exponential family of probability functions (p*) Where y is the adjacency matrix of a digraph and the sufficient statistic u(y) is any vector of statistics of the digraph Pseudolikelihood not a function of complete sufficient statistic u(Y) so not a “suitable” estimator Dahmstrom and Dahmstrom (1993) proposed MCMC

Snijders – MCMC Estimation Random graph is a Markov graph if number of nodes is fixed, and non- incident edges are independent conditional upon rest of graph Gibbs Sampling – all elements Yij are updated randomly, one element per draw, with all other elements left unchanged Assumes convergence at t ->  Conditional distribution toggles between Yij = 1 and Yij = 0 Can result in “severe convergence problems” Model may not simulate effects properly, or May result in an ‘explosion of ties’ after significant stasis Bi-modal distribution results, consisting of high-density and low- density states or regimes Regime is defined as a subset of the outcome space Other regimes are possible (besides bi-modal)

Snijders – MCMC Estimation Reciprocity p* model – # of edges and reciprocity Assumes dyadic independence Probabilities calculated for MAN Independence assumption precludes the ‘explosion’ effect Twostar p* model - # of edges and out-twostars Rows in adjacency matrix are statistically independent If total number of Y++ are fixed, number of out-twostars is a linear function of out-degree variance Combined reciprocity and twostar p* model – density, reciprocity, out-twostar Transforms digraph into its complement Changes Yij to (1 – Yij) Density must be set to 0.5 Simulates graphs equal to, less than or greater than 0.5 density Can result in the “explosion effect” In effect, results are determined by initial state ( high or low density)

Snijders – MCMC Estimation Gibbs sampling algorithm For every two outcomes, there is a positive probability to go from one outcome to the other in finite steps, but It is possible one regime is dominant, so that sojourn time from one state to the other is practically infinite, so Initial state determines outcome with 0.5 probability – coin toss Three problems arise Bi-modal distribution is undesirable for single network observation Convergence with two regimes can be so slow that generating a random draw is practically impossible Expected values of sufficient statistics are extremely sensitive to parameter values, causing instability of estimation Other iteration procedures have been proposed and tested

Snijders – MCMC Estimation Detailed balance technique Set of all adjacency matrices Yg Results in unique stationary distribution Small updating steps – one element of Yij per step, as with Gibbs sampling Cell being updated is random, rather than deterministic Referred to as mixing, versus cycling Metropolis-Hastings algorithm - Changes Yij to (1 – Yij), all other ties constant Updates more frequently than Gibbs, so more efficient Dyadic or triplet updating steps – update several elements per step Dyad or triplets chosen randomly “Groupwise” updating Slower to converge

Snijders – MCMC Estimation Large updating steps – update Yij from 0 to 1 or vice versa in blocks Biggest step is converting graph to its complement (inversion) Satisfies the detailed balance equation May be appropriate for bimodal distributions Inversion may reduce variance in estimation (conditioning) Fixed density – only digraphs with given number of ties are drawn Random undirected graphs – applied to half matrix of unique elements ML estimation – not easily applied to exponential random graphs, due to problematic calculation for complex models Pseudolikelihood estimates can be good, but standard errors are too low Monte Carlo Markov Chain estimates Monte carlo simulation of Markov graph estimates moments Moments are used to estimate parameter effects for a neighborhood

Snijders – MCMC Estimation MCMC: Newton-Raphson Algorithm and Robbins-Monro Algorithm similar Robbins-Monro Algorithm – three phases Estimate diagonal matrix using derivative of initial parameter estimate Iteratively determines provisional estimation values, leads quickly to solution of moment equation Large steps can lead to instability Parameter value is kept constant, then large number of steps used to check validity of equation Use of MC with Robbins-Monro yields, in theory, convergence probability of 1 Snijders recommends use of inversion steps for models with triplet counts

Robins et al – Recent Developments Robins et al (2007) Recent developments in exponential random graph (p*) models for social networks Technically, MCMC estimation does not converge due to degeneracy problem – “near degenerate” Problem is more acute as network size grows larger Inclusion of suitable constraints on parameters allows for estimation Parameters then provide information on structural effects Recall from Snijders problem of bimodal distribution/model degeneration Gradual increase in triangle parameter does not lead to gradual increase in graph triangulation, so inclusion of star/triangle parameters does not overcome problem

Robins et al – Recent Developments

Inclusion of higher-order structures Alternating k-stars Alternating k-triangles Alternating independent two-paths Alternating k-stars, technically only structure still a Markov random graph Assumption allows stars up to (n-1) Recall in previous models, higher-order stars normally set to 0 In alternating k-star, higher-order stars are allowed Impact of higher-order stars is gradually diminished Essentially, there is weighting of structure from simple to complex Allows for interesting inference regarding network structure Positive parameter indicates “hubs” in node structure Negative parameter indicates smaller variance in degree (decentralized)

Robins et al – Recent Developments Interpreting alternating k-star models Positive parameter – tendency toward large number of low degree nodes, and small number of high-degree nodes Node degree may become saturated Increase in “popularity” plateaus: additional ties do not “add value” Indicative of a loose core-periphery structure Alternation between positive and negative values helps prevent distribution graph from being forced to empty or complete graphs ( a la Snijders et al 06)

Robins et al – Recent Developments Alternating k-triangles introduces conditional dependence In short, two possible edges in a graph, Yrs and Yuv, for distinct nodes r, s, u, v, are assumed to be conditionally dependent if Ysu = Yuv = 1. In other words, if the two possible edges in the graph were actually observed, they would create a 4-cycle. Defines social circuit dependence Chance of Ysu is conditionally dependent on presence of Yuv Snijders et al (2006) combine k-triangles with Markov dependence K-triangle is combination of individual triangles that share one edge (base) Shared adjacency with other nodes are triangle sides Conditionally dependent structure, IF either Markov configuration (shared node), or Social Circuit Configuration (4-cycle)

Robins et al – Recent Developments

Interpreting k-triangles Positive parameter provides evidence of transitivity effects Also can suggest core-periphery structure, but due to triangulation rather than popularity influence More of a structural effect than an attribute effect IE, outcome of the triangulation process Alternating k-twopaths Lower order structure Combine with k-triangles Distinguish tendency to form ties at base versus side of triangle Side edges absent base edges indicates precondition to transitivity Presence of base edge indicates transitive closure Combination of parameters can indicate pressure towards closure

Robins et al – Recent Developments Other possible parameters

Robins et al – Recent Developments Estimating parameters MCMC is preferred method, when available When model converges, simulation produces distribution of graphs in which observed graph is typical for all effects Reliable standard errors Snijders et al (2006) conditioned on edges No density parameter Diminishes degeneracy problem with moderate impact on other parameters Robins et al find that, at least for smaller networks, conditioning on edges may not be needed

Robins et al – Recent Developments Modeling with SIENA Output of estimates, standard error, t-stat for estimate (how well model converges) t-ratio close to zero = good convergence of model Large ratios may indicate model has not converged, or is degenerate For non-degenerate models, absolute value of less than 0.1 is converged Other tests in SIENA Hysteresis analysis Simulate from estimates and compare with observed graph Modeling with statnet Newton-Raphson algorithm Fewer simulation runs, then weights graphs for estimating Incorporates advances from Metropolis-Hastings

Robins et al – Recent Developments

Comparing pseduolikelihood to MCMC UCINET datasets, SIENA modeling

Hunter et al – Package to Fit… Hunter et al (2008) A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks Implementing ERGM in R/statnet Specify ERGM Approximate/exact MLE Goodness of fit tests “The purpose of ERGM, in a nutshell, is to describe parsimoniously the local selection forces that shape the global structure of a network.” “ERGM may then be used to understand a particular phenomenon or to simulate new random realizations of networks that retain the essential properties of the original.”

Hunter et al – Package to Fit… Implementing ERGM in R/statnet – variables Endogenous – result of structure Exogenous – attribute based (can serve as predictors) Attributes can be treated as functions of nodal covariates Statistics depend on attribute and relationship information Change statistics – recall we are comparing conditional distribution toggled between Yij = 1 and Yij = 0 (or some other Markov configuration) Particular choice g() of statistics Particular network y Particular pair of nodes (i,j) Seed can be specified for reproducibility

Hunter et al – Package to Fit… Dyadic independence models Dyadic independence term Term in an ERGM for which change statistics can be calculated regardless of value of (i,j) or any knowledge of y Dyadic independence ERGM All terms in the model are dyadic independence terms This model is purely stochastic For undirected models, unconditional or marginal probability is allowed Important to distinguish between dyadic and linear independence Linear dependencies can arise with either form above Implications for model specification Statnet eliminates/allows for elimination of statistics as needed

Hunter et al – Package to Fit… Dyadic dependence models Dyads that do not share a node are conditionally independent Analogous to nearest neighbor Homogeneity condition may be added as a constraint All isomorphic networks have same probability Problems with model as previously discussed Correctives suggested: combine terms (endogenous and exogenous) Specify triad-based curved exponential family terms Geometrically weighted degree (GWD) Geometrically weighted edgewise shared partner (GWESP) Geometrically weighted dyadwise shared partner (GWDSP)

Hunter et al – Package to Fit… Curved exponential family model

Hunter et al – Package to Fit… Estimation and goodness of fit Parameters: Edges Homophily term for grade Main effect for sex P. 23

Morris et al – Specification of ERGM Morris et al (2008) Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects Where Hunter et al focused more on theory and statistical formulas, Morris et al provide basic instruction on implement ERGM in R/statnet Commands for basic effects, nodal attributes, relational attributes, structural configurations, higher-order configurations, actor specific effects, constraints Tips to fine-tune algorithm and processing Appendix A Table of Model Terms provides quick reference for what terms are appropriate to a particular model IE, directed/undirected, bipartite, dyadic independence etc.

Morris et al – Specification of ERGM Constraints Model must include space of all possible networks Some networks are bipartite – communication between but never within groups of nodes ERGM automatically implements these constraints as needed

Andrews – Regional Integration Andrew (2009) Regional integration through contracting networks Research question: Under what conditions do local governments choose to contract for services, or enter into regional agreements for the provision of services? Two hypotheses are advanced: Bonding hypothesis – in the presence of uncertainty and complexity of interjurisdictional activities, a highly dense network structure will emerge over time Bridging hypothesis – for interjurisdictional activities involving high asset specificity, a sparse, “core-periphery” network is anticipated Institutional collective action framework – transaction cost analysis, enforcement and monitoring, free-rider problem

Andrews – Regional Integration Bonding – local officials attracted to interjurisdictional, voluntary cooperation agreements Flexible, non-binding, fosters “norm of reciprocity” Can be constrained by local politics and coordination costs Bridging – in asset-specific dilemma, local officials likely to choose strategic partner May produce services in-house Induce competition to attenuate opportunism of central actor Expected to contract with partner who already has ties with other jurisdictions

Andrews – Regional Integration Research Design: Contractual ties among law enforcement community in Orlando-Kissimmee Five waves from 1986 to total actors List of goods & services derived from International City/County Management Association surveys Studying one metropolitan area controls for geographic variation and allows for in-depth analysis of regional integration

Andrews – Regional Integration

Parameters Transitive triads Geodesic distance-2 Covariate effects Importance of level of government, where municipality is coded 1 and higher level government is treated as benchmark Importance of professionalism, indicated by accreditation Both coded as dummy variables, treated as control variables Homophily effect Rate parameters were all positive and significant T-ration less than 0.3, indicating no problems with convergence (?)

Andrews – Regional Integration P.392

Andrews – Regional Integration P.392