Joint social selection and social influence models for networks: The interplay of ties and attributes. Garry Robins Michael Johnston University of Melbourne,

Slides:



Advertisements
Similar presentations
Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
Advertisements

Where we are Node level metrics Group level metrics Visualization
Emergence of Scaling in Random Networks Albert-Laszlo Barabsi & Reka Albert.
Analysis and Modeling of Social Networks Foudalis Ilias.
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
Sections 7-1 and 7-2 Review and Preview and Estimating a Population Proportion.
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Significance Testing Chapter 13 Victor Katch Kinesiology.
Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.
Exponential random graph (p*) models for social networks Workshop Harvard University February 2002 Philippa Pattison Garry Robins Department of Psychology.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Network Statistics Gesine Reinert. Yeast protein interactions.
Global topological properties of biological networks.
Topic 2: Statistical Concepts and Market Returns
Clustered or Multilevel Data
Novel Self-Configurable Positioning Technique for Multihop Wireless Networks Authors : Hongyi Wu Chong Wang Nian-Feng Tzeng IEEE/ACM TRANSACTIONS ON NETWORKING,
Chapter 11 Multiple Regression.
PSY 307 – Statistics for the Behavioral Sciences Chapter 8 – The Normal Curve, Sample vs Population, and Probability.
Experimental Evaluation
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Hypothesis Tests Regarding a Parameter 10.
Chapter 11: Inference for Distributions
Sunbelt 2009statnet Development Team ERGM introduction 1 Exponential Random Graph Models Statnet Development Team Mark Handcock (UW) Martina.
PSY 307 – Statistics for the Behavioral Sciences Chapter 8 – The Normal Curve, Sample vs Population, and Probability.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Objectives of Multiple Regression
Introduction to Monte Carlo Methods D.J.C. Mackay.
AM Recitation 2/10/11.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
The Erdös-Rényi models
Hypothesis Testing:.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Hypothesis Testing in Linear Regression Analysis
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Analysis of Variance ( ANOVA )
Sections 6-1 and 6-2 Overview Estimating a Population Proportion.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Social Network Analysis and Complex Systems Science
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Yaomin Jin Design of Experiments Morris Method.
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Neighbourhood-based models for social networks: model specification issues Pip Pattison, University of Melbourne [with Garry Robins, University of Melbourne.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Sections 7-1 and 7-2 Review and Preview and Estimating a Population Proportion.
A two minute introduction to: Exponential random graph (p*)models for social networks SNAC Workshop, Illinois, November 2005 Garry Robins, University of.
Slides are modified from Lada Adamic
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Complex brain networks: graph theoretical analysis of structural and functional systems.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Sampling and estimation Petter Mostad
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Introduction to ERGM/p* model Kayo Fujimoto, Ph.D. Based on presentation slides by Nosh Contractor and Mengxiao Zhu.
Selecting Input Probability Distributions. 2 Introduction Part of modeling—what input probability distributions to use as input to simulation for: –Interarrival.
The simultaneous evolution of author and paper networks
Methods of Presenting and Interpreting Information Class 9.
Inference about the slope parameter and correlation
Bayesian Semi-Parametric Multiple Shrinkage
Chapter 4 Basic Estimation Techniques
LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)
Exponential random graph models for multilevel networks
Network Science: A Short Introduction i3 Workshop
Chapter 9 Hypothesis Testing.
Why Social Graphs Are Different Communities Finding Triangles
Presentation transcript:

Joint social selection and social influence models for networks: The interplay of ties and attributes. Garry Robins Michael Johnston University of Melbourne, Australia Symposium on the dynamics of networks and behavior Slovenia, May 10-11, 2004 Thanks to Pip Pattison, Tom Snijders, Henry Wong, Yuval Kalish, Antonietta Pane

A thought experiment: Most models that purport to explain important global network properties are homogeneous across nodes. Might a simple model of interactions between node-level and tie-level effects be sufficient to explain global properties? 1.Develop a model that incorporates both social selection and social influence processes. 2.Which global properties of networks are important? 3.Simulate the model to see whether the these properties can be reproduced in a substantial proportion of graphs?

1. Develop a model incorporating both social selection and social influence effects

Simple random graph models For a fixed n nodes, edges are added between pairs of nodes independently and with fixed probability p (Erdös & Renyi, 1959) Bernoulli random graph distribution: X is a set of random binary network variables [X ij ]; X ij = 1 when an edge is observed, = 0 otherwise; x is a graph realization; θ is an edge parameter. an exponential random graph (p*) model. a homogeneous model – (node homogeneity) p and θ are independent of node labels

A Bernoulli random graph model will not fit this network well

In this example, actor attributes are important to tie formation Social selection

Yellow: Jewish Blue: Arab (Kalish, 2003) Exogenous attributes affect network ties In this example, actor attributes are important to tie formation Social selection

Binary variables: X ij network ties Z i actor attributes Exogenous attributes affect network ties ZiZi ZjZj X ij Robins, Elliott & Pattison, 2001

Effects in the model Baseline edge effect irrespective of attributes Propensity for actors with attribute z=1 to have more partners Propensity for ties to form between actors who both have attribute z=1 Equivalent (blockmodel) parameterization:

Social influence: Are actor attributes influenced by fixed network structure? Robins, Pattison & Elliott, 2001

Social influence: Are actor attributes influenced by fixed network structure?

A cutpoint Social influence: Are actor attributes influenced by fixed network structure? Exogenous network ties affect attributes

Binary variables: X ij network ties Z i actor attributes Exogenous network ties affect attributes ZiZi ZjZj X ij

Effects in the model Baseline effect for number of attributed nodes (z=1) Propensity for attributed nodes to have more partners No effect for an actor being influenced by a network partner need to introduce dependencies among attribute variables

Assume attribute variables are dependent if the actors are tied partial conditional dependence (Pattison & Robins, 2002) ZiZi ZjZj X ij

Effects in the model Baseline effect for number of attributed nodes (z=1) Propensity for attributed nodes to have more partners Propensity for attributed nodes to be connected

Friendship network for training squad in 12 th week of training (Pane, 2003) Green: detached Yellow: team oriented Red: positive Why should attributes or ties be exogenous?

Models for joint social selection/social influence ZiZi ZkZk X ik X ij ZjZj

Effects in the model Quadratic effect in no. of attributed nodes Propensity for attributed nodes to have more partners Propensity for attributed nodes to be connected Baseline effect for no. of edges Equivalent (blockmodel) parameterization:

Change statistics Conditional log-odds for a tie to be observed: Conditional log-odds for an attribute to be observed:

2. Which global network properties are important ? –Small worlds Short average geodesics High clustering –Skewed degree distributions –Regions of higher density among nodes cohesive subsets, “community structures”

Confiding (trust) network (Pane, 2003) An example network (without attributes)

Many observed networks have short average geodesics – small worlds The confiding network has a median geodesic (G50) of 2: not extreme compared to a distribution of Bernoulli graphs The confiding network has a third quartile geodesic (G75) of 2: also not extreme compared to a distribution of Bernoulli graphs. Observed networks: Path lengths

Many observed networks have high clustering – small worlds Observed networks: Clustering Global Clustering coefficient: 3 × (no. of triangles in graph) / (no. of 2-paths in graph) = 3T / S 2 The confiding network has a global clustering coefficient of 0.41: a comparable Bernoulli graph sample has a mean clustering coefficient of 0.25 (sd=0.03)

Many observed networks have high clustering – small worlds Observed networks: Clustering Local Clustering coefficient: For each node i, compute density among nodes adjacent to i. Average across the entire graph. The confiding network has a local clustering coefficient of 0.58: a comparable Bernoulli graph sample has a mean clustering coefficient of 0.25 (sd=0.04)

Many observed networks have high clustering – small worlds The confiding network has a global clustering coefficient of 0.41 The confiding network has a local clustering coefficient of 0.58 Observed networks: Clustering

Many observed networks have skewed degree distributions as is the case for the confiding network Observed networks: Degree distribution

Observed networks: Higher order clustering k-triangles (Snijders, Pattison, Robins & Handcock, 2004) Alternating k-triangles 1-triangle 2-triangle 3-triangle Permits modeling of (semi) cohesive subsets of nodes (cf community structures)

Observed networks: Higher order clustering Observed networks often exhibit regions (subsets of nodes) with higher density In which case, we will see an alternating k-triangle statistic higher than for Bernoulli graphs The k-triangle statistic is not simply equivalent to global clustering

Short median geodesics (G50) Short third quartile geodesics (G75) – perhaps? High clustering High k-triangle statistics Skewed degree distributions Bernoulli distributions tend to have short median geodesics, low clustering and low k-triangles Hence a basis for comparison Summary Some global features not uncommon in observed networks

3. Simulate the model to see whether global properties can be reproduced

Use the Metropolis algorithm – procedure similar to Robins, Pattison & Woolcock (in press) Typically 300,000 iterations reject initial simulations for burnin Sample every 1000 th graph Inspect degree distributions across sample Compare each graph in sample with a Bernoulli graph distribution with same expected density Hence can determine if graph - has short G50, G75 - highly clustered; high k-triangles Define highly clustered and short G50 as SW50 (small world) Similarly define SW75 Simulation of the model

Quadratic effect in no. of attributed nodes Propensity for attributed nodes to have more partners Propensity for attributed nodes to be connected Baseline effect for no. of edges First simulation series: 30 node graphs

Change statistics Conditional log-odds for a tie to be observed: Expect density to be same among: non-attributed nodes (z i = z j = 0) attributed nodes (z i = z j = 1)

Numbers of edges and attributed nodes

Assortative and dissasortative mixing

Acceptance rates

Clustering

k- triangles

Geodesics and clustering

Small worlds

Degree distributions

Graph is SW50 (but not SW75) t-statistic for k-triangles (relative to Bernoulli) = 2.02

The graph also has a skewed degree distribution: Although unusual for graphs in this distribution

Conclusions for this series of simulations The parameter estimates results in approximately equal numbers of attributed and non-attributed nodes –Density within the two sets of nodes are similar and high. As the “attribute expansiveness” (β 1 ) parameter becomes more negative, and the “attribute connection” (β 2 ) parameter more positive: –acceptance rate for attributes decreases, –clustering and community structure increases, 3 rd quartile geodesics decrease, but median geodesic remain relatively short Graphs with “small world” features, but not with skewed degree distributions, are common within a medium range of the “attribute expansiveness” parameter.

Quadratic effect in no. of attributed nodes Propensity for attributed nodes to have more partners Propensity for attributed nodes to be connected Baseline effect for no. of edges Second simulation series: 30 node graphs

Numbers of edges and attributed nodes

Assortative and dissasortative mixing

Acceptance rates

Clustering

k- triangles

Geodesics and clustering

Small worlds

Degree distributions

Graph is SW50 (but not SW75) t-statistic for k-triangles (relative to Bernoulli) = 3.98

Graph is SW50 (but not SW75) t-statistic for k-triangles (relative to Bernoulli) = 3.98 And with skewed degree distribution

Conclusions for the second series of simulations The parameter estimates result in a minority of attributed nodes with high internal density, and a majority of non- attributed nodes with lower density. As the “attribute connection” (β 2 ) parameter increases, no of edges and attributes increase somewhat, and acceptance rate for attributes decreases, –clustering and community structure increases, 3 rd quartile and median geodesic become longer. –Degree distributions become skewed, and then bimodal Graphs with “small world” features, and with skewed degree distributions, make up a sizeable proportion of distributions with large “attribute similarity” parameter.

Some final comments This “thought experiment” demonstrates that several important global features of social networks may be emergent from attribute-based processes of mutually interacting social influence and social selection: –Short average paths –High clustering –Small world properties –Community structures –Skewed degree distribution Moreover, the models do not presume fixed attributes –although the structural properties begin to emerge as attributes become “sticky” (changing more slowly)

Some final comments Network models typically assume homogeneity across graphs. –This assumption may not be appropriate to the actual processes that are generating the network. –One way that homogeneity may break down is through attribute-based processes. –Other possibilities include: social settings; geographic proximity Network studies may require a careful conceptualisation of “process” to ensure that models are properly specified. –Because process is (usually) local, with global implications, the possibility of node-level effects should not be excluded.