A p* primer: logit models for social networks

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Continued Psy 524 Ainsworth
An introduction to exponential random graph models (ERGM)
Brief introduction on Logistic Regression
Where we are Node level metrics Group level metrics Visualization
The Statistical Analysis of the Dynamics of Networks and Behaviour. An Introduction to the Actor-based Approach. Christian Steglich and Tom Snijders ——————
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Correlation and regression
Models with Discrete Dependent Variables
Exponential random graph (p*) models for social networks Workshop Harvard University February 2002 Philippa Pattison Garry Robins Department of Psychology.
Statistical Methods Chichang Jou Tamkang University.
Joint social selection and social influence models for networks: The interplay of ties and attributes. Garry Robins Michael Johnston University of Melbourne,
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Chapter 11 Multiple Regression.
Sunbelt 2009statnet Development Team ERGM introduction 1 Exponential Random Graph Models Statnet Development Team Mark Handcock (UW) Martina.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Objectives of Multiple Regression
Understanding Statistics
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Correlational Research Chapter Fifteen Bring Schraw et al.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
A two minute introduction to: Exponential random graph (p*)models for social networks SNAC Workshop, Illinois, November 2005 Garry Robins, University of.
A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Introduction to Statistical Models for longitudinal network data Stochastic actor-based models Kayo Fujimoto, Ph.D.
Tutorial I: Missing Value Analysis
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Introduction to ERGM/p* model Kayo Fujimoto, Ph.D. Based on presentation slides by Nosh Contractor and Mengxiao Zhu.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
The simultaneous evolution of author and paper networks
Some Terminology experiment vs. correlational study IV vs. DV descriptive vs. inferential statistics sample vs. population statistic vs. parameter H 0.
Methods of Presenting and Interpreting Information Class 9.
Stats Methods at IC Lecture 3: Regression.
Statistics & Evidence-Based Practice
Nonparametric Statistics
Logic of Hypothesis Testing
BINARY LOGISTIC REGRESSION
Chapter 4 Basic Estimation Techniques
Chapter 7. Classification and Prediction
Chapter 11: Simple Linear Regression
Statistical Data Analysis
CJT 765: Structural Equation Modeling
Adoption of Health Information Exchanges and Physicians’ Referral Patterns: Are they Mutually Reinforcing? SAEEDE EFTEKHARI*, School of Management, State.
Chapter 8: Inference for Proportions
Elementary Statistics
CHAPTER 29: Multiple Regression*
Nonparametric Statistics
CHAPTER 26: Inference for Regression
Review for Exam 2 Some important themes from Chapters 6-9
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Logistic Regression.
From GLM to HLM Working with Continuous Outcomes
Paired Samples and Blocks
Statistical Data Analysis
Chapter 7: The Normality Assumption and Inference with OLS
Fixed, Random and Mixed effects
Product moment correlation
An Introduction to Correlational Research
Lecture # 2 MATHEMATICAL STATISTICS
Parametric Methods Berlin Chen, 2005 References:
Chapter Nine: Using Statistics to Answer Questions
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
MGS 3100 Business Analysis Regression Feb 18, 2016
Longitudinal Social Network Data
Presentation transcript:

A p* primer: logit models for social networks Carolyn J. Anderson, Stanley Wasserman & Bradley Crouch (1999)

1. Predictive Models: Problems Relationship specific social relation – explanatory variables Response variable dichotomous/discrete (actor i does or does not have a relational tie to actor j) The strong association between actor i’s relational tie to actor j, and actor j’s relational tie to actor i Explanatory variables can be of several different types

2. New Family of Models: p* Logit models & logistic regressions to Social network analysis Models – standard response/explanatory variables Response variable is a logit or log odds of the probability a tie is present Explanatory variables can be general P* easily fit approx using logistic regression software

3. Example: Friendship Data (Parker & Asher, 1993) Study investigates the relationship between elementary school children’s friendship & peer group acceptance Network data: children 36 classrooms (12 3rd grade, 10 4th grade & 14 5th grade) 5 public elementary schools Total 881 children

Data collected 3 relations (very best friendship, best friendship, friendship) Attribute information (gender, age, race. Loneliness) Acceptance (‘roster & rating’ how much they liked to play with each classmate)

Illustration of p* 3 classrooms ( one from each grade) Analyzed individually & simultaneously (differences among classes) Fit models - emphasize dyadic effects Reciprocity & mutuality Ignored distinctions, created one relation reflecting friendship – either present or not

Fig. 1. The directed graph of the 5th grade classroom where the squares and circles respectively represent the boys and girls, and the arrows depict the presence of a directed friendship relational tie.

Goal After studying each classroom individually Want to look at similarities & differences simultaneously across the classrooms Develop the 1st p*-type models for multiple networks

g X g Sociomatrix Represents social relation (i,j) entry in the matrix – value of the tie from actor i to actor j on that relation Dichotomous Relation

Table 1: 3rd grade classroom—friendship relation

Graph-theoretic Characteristics - Sociomatrix # of ties # of mutual dyads Outdegree (# of relational ties) Define actor level qualities: gender, age, grade, race, ethnicity, seniority

4. An Introduction to p* Where Ө is a vector of the r model parameters, z(x) is the vector of the r explanatory variables, and k is normalizing constant that ensures that the probabilities sum to unity. The Ө parameters are the unknown ‘regression’ coefficients and must be estimated.

4. An Introduction to p* - p*

4. An Introduction to p* - Logit Models The alternative version of model 2 is a logit model; not depend on k In a logit or logistic regression model, the response variable is dichotomous and is coded as a binary variable (for example, Y *=1 or 0). Probabilities are modeled as a function of a linear combination or a linear predictor of the explanatory variables.

4. An Introduction to p* - Logit p* p* is a model not for the probabilities of individual ties, but for the entire collection of ties; we work with probabilities conditional on all the other relational ties in the network, not with the ‘marginal’ probabilities. The logit version of p* by taking the logarithm of the odds;

4. An Introduction to p* - Fitting and evaluating logit p* Fitting p* to single or multiple networks by maximizing pseudo-likelihood function The difference in pseudo-likelihood ratio statistics can be evaluated approximately by referring the value of a distribution with degree of freedom equal to the number of parameters associated with the variable in question. The square of such ratio is known as Wald statistics and labeled as WaldPL

5. Example: a Single Class - Model Building Process The first step in the modeling process is the identification of the effects or variables that are potentially important or interesting. The next step is to fit simpler, more restrictive models until no simpler model can be found that does not substantially decrease the fit of model to the data. The last step is to analyze data and interpret its results; When an estimated choice parameter is positive, the probability that a tie is present is larger than the probability that it is absent.

5. Example: a Single Class - Model Building Process

5. Example: a single class - Model Building Process e.g.) The odds that a tie is present given that two children are both girls or both boys is exp(-2.26-(-4.17))=exp(1.91)=6.75 times larger than the odds when one child is a boy and the other is a girl.

Part 6-7: p* model extension to multiple networks Objective: to study differences among groups (different classes in Parker and Asher’s data set. For example: whether the tendencies toward mutuality are the same across networks/classes. “We need to fit all the classes in a single analysis so that we can place equality restrictions on parameters across models pp55”

6. Construct multiple networks p* model Step 1) Integrate multiple networks into one network:

6. Construct multiple networks p* model Step 2) p* model for multiple networks Note: The explanatory statistics are not measured at the individual actor level, but are measured at graph-level.   Statistics are only meaningful at individual actor-level (non-homogeneous) is not considered here.

7. Model selection The homogeneous effects listed in Table 4 that are likely to be important and interesting for modeling friendship relations are choice, mutuality, degree centralization, degree group prestige, transitivity, cyclicity, 2-in-stars, 2-out-stars, and 2-mixed-stars. Pp57

7. Model selection

7. Model selection To tease apart the effects resulting from age (or grade level) and classes requires the use of multiple classes from each grade level. Gender is used again as a blocking factor for the homogenous effects. Loneliness and acceptance are measured on individuals and may predict friendship ties between children. These variables can be restricted across class. Include loneliness of both sending and receiving actors. Only the acceptance ratings of the receiver are included.

7. Model selection A tally of the number of parameters for our initial model reveals that we have 108 parameters, 36 per class: 2 for choice, 3 for mutuality, 4 for degree centralization, 4 for degree group prestige, 4 for transitivity, 4 for cyclicity, 4 for 2-in-stars, 4 for 2-out-stars, for 2-mixed-stars, 1 for acceptance ratings of the receiver, and 2 for loneliness of the sender and receiver.  

7. Model selection-to simplify Simplify the model: a. when no restrictions are placed on parameters across classes, the distributions for the individual classrooms are statistically independent. B. so we first fit the 36-prameter model for each class separately, and place restrictions within classes. C. the second round we test whether the restriction between classes is needed and whether future restrictions are possible.

7. Model selection-to simplify

7. Estimate Model 4

8. Conclusion 1) The p* model for a single social network and the multiple network extension introduced here overcome the severely limiting assumption of independence on dyads made by earlier statistical models for such data. 2) logistic regression is easily fit to data   3) The p* model can help researchers measure study both the underlying process generating ties within single network and across multiple networks

Bradley Crouch and Stanley Wasserman (1997) RPAD 777 Catherine Dumas A Practical Guide To Fitting p* Social Network Models Via Logistic Regression Bradley Crouch and Stanley Wasserman (1997) RPAD 777 Catherine Dumas

Linear Regression Review One goal in regression analysis is to relate potentially “important” explanatory variables to the response variable of interest

Estimates of the β coefficients can be found such that the sum of the squared differences between the observed responses (Yi ) The responses predicted by the model ( Ŷ ) is at a minimum The least squares estimates of the regression coefficients minimize the quantity

Review Cont. Some information about the importance of each explanatory variable from a regression by can be obtained by inspecting the sign and magnitude of the estimated regression coefficients The model states that the response Yi changes by a factor of βj when the j th explanatory variable increases by one unit while the remaining explanatory variables are held constant

Assessing The Fit Of A Logistic Regression Model R² is a natural measure of fit for linear regression models as it is directly related to the least squares criterion used to obtain the “best” estimates of the regression parameters Logistic regression coefficients are estimated by maximum likelihood, using a an iteratively reweighted least squares computational procedure

The “natural” measure of model fit is given by the maximized log likelihood of the model given the observed data, and denoted by L. We can compare the fit of two logistic regression models by inspecting the likelihood ratio statistic, where Lɍ is the log likelihood of the full model and Lʀ is the log likelihood of the reduced model (obtained by setting q of the parameters in the full model to zero)

When the full model “fits” and the number of observations is large, LR is distributed as a chi-squared random variable with q degrees of freedom

A Small Artificial Network Dataset Fictitious network 6 organizations Two types: Gov’t Research Organization (Circles) – Private R&D lab (Squares)

Simulation Suppose that directional relation X= “Provides programming support to” was measured on the 6 actors involved in a software collaboration Reason for forming collaboration – to provide equal access of programming efforts Researcher may be interested if the organizations do provide programming support others with equal frequency

There may exist a tendency to provide programming assistance more frequently to those of their own type One can describe the presence or absence of these tendencies in a number of ways HOWEVER… p* models provide a statistical framework to test hypotheses like that of “unequal access”

Presence or Absence of Certain Network Structures If the density of ties within organizations of a certain type is greater than that outside of their type, this lends evidence to unequal access—one might call this the presence of positive differential “Choice Within Positions” p* models postulate that the probability of an observed graph is proportional to an exponential function of a linear combination of the network statistics

The Logit p* Representation The log linear form of p* can be reformulated as a logit model for the probability of each network tie, rather than the probability of the sociomatrix as a whole (Wasserman, Pattison) WP defines 3 new sociomatrices:

Statistical interpretation of logistic regression models depends on the assumption that the logits are independent of one another p*, the logits are clearly not independent Measures such as the likelihood ratio statistic do not carry a strict statistical interpretation, but are useful as a liberal guide for evaluating model goodness-of-fit

Model Fitting With Logistic Regression

Summary SPSS output/Interpretation Summary with different parameters

Recall the ‘unequal access’ hypothesis from the description of 6-actor network It was conjectured that organizations may tend to support the programming efforts of other organizations of their own type more often than those of other types Inspection of -2L suggests that Models 1-3 do not differ greatly with respect to overall fit, lending evidence against both the presence of differential choice within positions and a tendency for (or against) the transitivity of ties

It appears that there is no strong evidence to conclude that these fictitious organizations tend to differentially support those in either network position It is clear that programming support is often reciprocated

Additional Topics The MDS CRADA Network (Cooperative Research and Development Agreement) - enable government and industry to negotiate patent rights and royalties before entering into joint research and development projects Preprocessing Network Data With PREPSTAR Some Model Comparisons Using SAS

References Crouch, B., & Wasserman, S. (1997). A Practical Guide to Fitting p* Social Network Models Via Logistic Regression.

Robins et al. (2007). An introduction to ERGM for social networks Robins et al. (2007). An introduction to ERGM for social networks. Social Networks 32, 44-60

Introduction This article provides an introductory summary to the formulation and application of exponential random graph models for social networks The possible ties among nodes of a network are regarded as random variables, and assumptions about dependencies among these random tie variables determine the general form of the exponential random graph model for the network. Examples of different dependence assumptions and their associated models are given, including Bernoulli, dyad-independent and Markov random graph models. The incorporation of actor attributes in social selection models is also reviewed.

The logicl behind p* model-assumption assumption is that the network is generated by a stochastic process in which relational ties come into being in ways that may be shaped by the presence or absence of other ties (and possibly node-level attributes). In other words, the network is conceptualized as a selforganizing system of relational ties.

The logic behind p* model Once we have defined a probability distribution on the set of all graphs with a fixed number of nodes, we can also draw graphs at random from the distribution according to their assigned probabilities, and we can compare the sampled graphs to the observed one on any other characteristic of interest. If the model is a good one for the data, then the sampled graphs will resemble the observed one in many different respects. In this ideal case, we might even hypothesize that the modeled structural effects could explain the emergence of the network. And we can examine the properties of the sampled graphs in order to understand the nature of networks that are likely to emerge from these effects.

The logicl behind p* model-example As an example, consider friendship in a school classroom. The observed network is the network for which we have measured friendship relations. There are many possible networks that could have been observed for that particular classroom. We examine the observed friendship structure in the classroom in the context of all possible network structures for the classroom. Some structures in the classroom may be quite likely and some very unlikely to happen, and the set of all possible structures with some assumption about their associated probabilities is a probability distribution of graphs. We are placing the observed network within this distribution

p* model-A general framework for model construction Step 1: each network tie is regarded as a random variable Step 2: a dependence hypothesis is proposed, defining contingencies among the network variables Step 3: the dependence hypothesis implies a particular form to the model Step 4: simplification of parameters through homogeneity or other constraints Step 5: estimate and interpret model parameters

The general form of the exponential random graph model: dependence assumptions and parameter constraints The model describes a general probability distributions of graphs on n nodes. The dependence assumption is crutial in the model: The possible ties among nodes of a network are regarded as random variables, and assumptions about dependencies among these random tie variables determine the general form of the exponential random graph model for the network

Constraints on parameters The general form of the exponential random graph model: dependence assumptions and parameter constraints Constraints on parameters There could be too many parameters in a model, hence we impose a homogeneity assumption by equating parameters when they refer to the same type of configuration. The resulting error is then consumed into the model as statistical noise.

The general form of the exponential random graph model: dependence assumptions and parameter constraints-cont. For instance, in considering reciprocity, Paul may tend very strongly to reciprocate friendship offers from others, but Mary might be more cautious. For the purpose of constructing a simpler model, however, we may assume that there is a single tendency for reciprocity shared by both Mary and Paul. The resulting error is then consumed into the model as statistical noise.

Other ways of constraints: The general form of the exponential random graph model: dependence assumptions and parameter constraints-cont. Other ways of constraints: We can equate parameters for isomorphic configurations involving similar types of actors. For example: we could propose one reciprocity parameter for girl–girl configurations, one for girl–boy configurations and another for boy–boy configurations Different way of constraints result in different models.

Dependence assumptions and models Bernoulli graphs: the simplest dependence assumption Dyadic models: the dyadic independence assumption Markov random graphs Dependence structures with node-level variables More complex dependence assumptions New model specifications

Dependence assumptions and models Bernoulli graphs: the simplest dependence assumption Bernoulli random graph distributions are generated when we assume that edges are independent, for instance if they occur randomly according to a fixed probability α

Dependence assumptions and models Dyadic models: the dyadic independence assumption A somewhat more complicated (but not usually very realistic) assumption for directed networks is that dyads, rather than edges, are independent of one another. We have two configurations in the model: single edges and reciprocated edges

Dependence assumptions and models Dyadic models: the dyadic independence assumption

Dependence assumptions and models Markov random graphs Frank and Strauss (1986) introduced Markov dependence, in which a possible tie from i to j is assumed to be contingent on any other possible tie involving i or j, even if the status of all other ties in the network is known. In this case, the two ties are said to be conditionally dependent, given the values of all other ties. For instance, the relationship between Peter and Mary may well be dependent on the presence or absence of a relationship between Mary and John (especially if the relationship is a romantic one!)

Dependence assumptions and models Markov random graphs

Dependence assumptions and models Dependence structures with node-level variables Including node-level effects into the model We assume a vector X of binary attribute variables with Xi = 1 if actor i has the attribute and Xi = 0, otherwise. The vector x is then the set of observations on X. It is possible to generalize to polytomous and continuous attribute measures but we will restrict the current discussion to binary attributes.

Dependence assumptions and models Dependence structures with node-level variables - For example: We can investigate a similarity or homophily hypothesis as a basis for social selection – that social ties tend to develop between actors with the same attributes – by looking at the distribution of ties given the distribution of attributes.

Dependence assumptions and models More complex dependence assumptions Elaborations of exponential random graph models that go beyond Markov random graphs have been developed. dependencies within social settings possible examples settings based on a spatiotemporal context, such as a group of people gathered together at the same time and place settings based on a more abstract sociocultural space, such as pairs of persons linked by their political commitments settings that reflect external “design” constraints, such as organizational structure.

Dependence assumptions and models More complex dependence assumptions non-Markov dependencies among ties that did not share an actor but might be interdependent through third party links. For instance, Yij may be conditionally dependent on Yrs for four distinct actors if there is an observed tie between either i or j and either r or s. These realization-dependent models can be developed through what Pattison and Robins (2002) described as partial dependence structures. These models also permit the introduction of triangles involving attribute effects.

Dependence assumptions and models New model specifications Based on realization-dependence structures, Snijders et al. (in-press) developed new specifications for ESPMs that include new higher order terms. These models introduce constraints on k-star parameters, as well as new higher-order k-triangle configurations which allow for the measurement of highly clustered regions of the network where two individuals may be connected to a large number of k others (a k-triangle)

Estimation Pseudo-likelihood estimation: an approximate technique each possible tie Yij becomes a case in a standard logistic regression procedure, with yij predicted from the set of change statistics

Estimation cont. This procedure looks like a logistic regression – or indeed, a loglinear model – but it is not. Logistic regression assumes independent observations, an assumption we explicitly do not make with Markov and higher order models. So the parameter estimates may be biased; and the standard errors are approximate at best, and may be too small.

Estimation cont. One should not rely on the Wald statistic as a mean to decide whether a parameter is significant or not. As well, one cannot assume that the pseudo-likelihood deviance is asymptotically distributed as Chi-squared (which would be the case in normal logistic regression). When the dependence among observations is not so strong, it is generally the case that PL estimates will be more accurate.

Estimation cont. Markov chain Monte Carlo maximum likelihood estimation (MCMCMLE) Simulation is at the heart of Monte Carlo maximum likelihood estimation

Estimation cont. Markov chain Monte Carlo maximum likelihood estimation (MCMCMLE) simulation of a distribution of random graphs from a starting set of parameter values, and subsequent refinement of the parameter values by comparing the distribution of graphs against the observed graph, with this process repeated until the parameter estimates stabilize.

Estimation cont. Near degeneracy – a problem - occurring when a model implied that only a few graphs had other than very low probability (often these were the full graph or the empty graph) - If a model implies only these rather uninteresting outcomes, it will not be useful for modeling real networks

Estimation cont. Near degeneracy – a problem Including at least none-zero three-star parameter could decrease near degeneracy is but not sufficient enough… The primary problem in these cases is that the model is not well-specified. Reference&citation: Handcock, M., Hunter, D., Butts, C., Goodreau, S., Morris, M., 2006. Statnet: An R Package for the Statistical Analysis and Simulation of Social Networks. Manual. University of Washington, http://www.csde.washington.edu/statnet.

A short example: a Markov random graph model for Medici business network Dataset used: We fit a Markov random graph model for the well-known non-directed network of business connections among 16 Florentine families, available in UCINET 5 (Borgatti et al., 1999)

A short example: a Markov random graph model for Medici business network Parameters included: The model includes edge, two-star, three-star and triangle parameters

A short example: a Markov random graph model for Medici business network Estimation: MCMCMLE parameter estimation This model is not degenerate for this data set and parameter estimates successfully converge.

A short example: a Markov random graph model for Medici business network Result:

A short example: a Markov random graph model for Medici business network Result: density and triangle parameters are substantial in magnitude in comparison with their standard errors. Interpretation is therefore relatively simple. The negative density parameter indicates that edges occur relatively rarely, especially if they are not part of higher order structures such as stars and triangles. The positive triangle parameter can be interpreted as providing evidence that the business ties tend to occur in triangular structures and hence to cluster into clique-like forms. The star effects are not significant, so perhaps do not merit interpretation. But the parameter values suggest that there is a tendency for multiple network partners (the positive two-star estimate) but with a ceiling on this tendency (the negative three-star parameter). So, while there is tendency for network actors to have multiple partners, there are few actors with very many partners.

Conclusion Recent work on the Markov random graph models of Frank and Strauss (1986) shows that they may be inadequate for many observed networks. In reviewing developments in these models to this point, we have deliberately made no more than very summary comments on improved model specification. The new specifications of Snijders et al. (in press) offer substantial improvement in the practical use of exponential random graph models Reference&Citation: Snijders, T.A.B., Pattison, P.E., Robins, G.L., Handock, M. New specifications for exponential random graph models. Sociological Methodology, 2006

Introduction to Stochastic Actor-based Models for Network Dynamics Snijders et al (2010) Minyoung Ku Doctoral Student Rockefeller College of Public Affairs and Policy University at Albany, State University of New York

Overview Introduction Model Assumptions Issues Arising in Statistical Modeling More Complicated Models Dynamics of Networks and Behavior Cross-sectional and Longitudinal Modeling

Introduction Tutorial introduction to “Stochastic Actor-based Models” for network dynamics - represent network dynamics on the basis of longitudinal data - evaluate network dynamics according to the paradigm of statistical inference

Introduction DV: Creation or Termination of different network ties IV: Relational Tendencies e.g. reciprocity, transitivity, homophily, and assortative matching Actor-based models allow to test hypotheses about these tendencies and to estimate parameters expressing their strengths, while controlling for other tendencies

Model Assumptions Network ties are not brief events, but can be regarded as states with a tendency to endure over time. - Any changes in a network can be interpreted the outcomes of a Markov process; the current state of a network determines probabilistically its future evolution. - All relevant information is assumed to be included in the current state.

Model Assumptions – Basic Assumptions Given that i: ego and j: alter 1. Continuous-time: the network is observed only at some discrete points in the continuous time framework 2. Markov process: the total network structure is the social context that influences the probabilities of its own change. 3. Actor-based: the actors control their outgoing ties, on the basis of their and others’ attributes, their position in the network, and their perceptions about the rest of the network. 4. Tie change one by one : not coordinated but sequentially

Model Assumptions – Basic Assumptions 5. Tie creation rate may depend on the network position of the actors and actor covariates 6. Tie termination may depend on the network position, as well as actor covariates

Model Assumptions – Change Determination Model The probabilities for a choice (tie creation or no action) depend on the objective function. The objective function must represent the research questions and relevant theoretical and field-related knowledge.

Model Assumptions – Specification for the Objective Function “Rule of network behavior” ; it determines the probabilities of change in the network. A linear combination of a set of components: so called “effects” where, Ski (x) are the effects, ‘tendencies’ and the weights βk are the statistical parameters ; This function represent aspects of the network as ‘viewed’ from the point of view of actor i.

Model Assumptions – Basic Effects Two types of effects: - Structural or endogenous effects: effects depending only on the network - Covariate or exogenous effects: effects depending only on externally given attributes Basic effects: - The outdegree of actor i – the basic tendency to have ties - The tendency toward reciprocity: the number of reciprocated ties of actor i.

Model Assumptions – Transitivity and Other Triadic Effects Tendency toward transitivity (transitive closure, clustering): friends of friends become friends Graph-theoretic terminology: i → j → h is closed by the tie i → h. Transitive ties effect: the number of other actors h for which there is at least one intermediary j forming a transitive triplet The three-cycle effect: the number of three-cycles that actor i is involved in.  

Model Assumptions – Degree-related Effects In- and outdegree are primary characteristics of nodal position and important driving factors in the network dynamics. Degree-related popularity: attrativeness - measured by the sum of indegree of the targets of i’s outgoing ties Degree-related activity: propensity to form ties to others - measured by the indegree (outdegree) of i times i’s outdegree

Model Assumptions – Degree-related Effects The degree-related effects represent global hierarchy while the triadic effects represent local hierarchy. Assortativity-related effects: actors might have preferences for other actors based on their own and the others’ degree.

Model Assumptions – Covariates: Exogenous Effects Basic Covariates - The ego effect : whether actors with higher V values tend to nominate more friends and hence a higher outdegree  - The alter effect: whether actors with higher V values will tend to be nominated by more others and hence have higher indegree  -The similarity effect: whether ties tend to occur more often between actors with similar values on V (tendency to homophily)

Model Assumptions – Covariates: Exogenous Effects Others - The ego-alter interaction effect: actors with higher V values have a greater preference for other actors who likewise have higher V values   - The same V effect: the tendency to have ties between actors with exactly the same value of V A dyadic covariate means the extent to which a tie between two actors is more likely when the dyadic covariate is larger.

Model Assumptions – Interactions The ego-alter interaction effect: actors with higher V values have a greater preference for other actors who likewise have higher V values Interaction of a covariate with reciprocity e.g. A friendship network between exchange students; negative interaction between reciprocity and having the same nationality → Theory-driven Effective Selection

Issues Arising in Statistical Modeling How to specify the model How to interpret the results

Issues Arising in Statistical Modeling – Date Requirement The number of observation moments (panel waves) – at least 2, usually less than 10 The number of actors – larger than 20, but if the data contains many waves, a smaller number of actors could be acceptable. The total number of changes between consecutive observations – larger enough , but not too high

Issues Arising in Statistical Modeling – Testing and Model Selection T-test based on the normal distribution; tested by referring the t-ratio. For actor-based models for network dynamics, information-theoretic model selection criteria have not yet generally been developed. The current best way is to use ad hoc stepwise procedures, combining forward steps with backward steps by adding or deleting effects to the model.

Example: Friendship Dynamics 26 students (17 girls and 9 boys, aged 11-13) 4 time periods between September 2003 and June 2004 Nominating up to 12 classmates who are considered good friends Friendship formation tends to be reciprocal, shows tendencies toward network closure, and in the age group is strongly segregated according to the gender.

Example: Friendship Dynamics

Example: Friendship Dynamics The parameters in the objective function can be interpreted in two ways: - attractiveness - the chance to be friends when actor i has the opportunity to make a chance in her or his outgoing times.

More Complicated Models – Different Rates of Change Depending on actor attributes or on positional characteristics, actors might change their ties at difference frequencies.

More Complicated Models – Differences b/t Creating and Terminating Ties Creating ties: the evaluation function Terminating ties: the endowment function

More Complicated Models – Differences b/t Creating and Terminating Ties

Dynamics of Networks and Behavior Social networks are relevant for behavior and other actor-level outcomes. Behavior is “endogenously” changing actor variables and these variables are ordinal discrete with values 1, 2, etc. e.g. several levels of smoking. The dependence of the network dynamics on the total network-behavior configuration: the social selection process

Dynamics of Networks and Behavior The dependence of the behavior dynamics on the total network-behavior configuration: the social influence process. Both of them also can be modeled by an extension of the actor-based model to a structure where the dependent variables consist not only of the tie variables but also of the actors’ behavior variables. Assumptions for these extensions are the extensions of the assumptions for network dynamics.  

Dynamics of Networks and Behavior – the Objective Function   are functions depending on the behavior of the focal actor i, but also on the behavior of his network patterns, network position, etc. is the strength of the effects of these functions on behavior choices. Z is used to distinguish the effects and parameters for behavior change from those for network change.

Dynamics of Networks and Behavior – Basic Shape Effect Basic tendencies determining behavior change that are independent of actor attributes and network position.  

Dynamics of Networks and Behavior – Influence and Position-dependent Effects Ego’s behaviors are affected by alters’ behavior, social influence. To capture social network effects, additional terms in the behavior evaluation function are needed. - The average similarity effect: the preference of actors to be similar in behavior to their alters - The total similarity effect: the preference of actors to be similar in behavior to their alters - The average alter effect: actors whose alters have a higher average value of the behavior, also have themselves a stronger tendency toward high values on the behavior.  

Dynamics of Networks and Behavior – Specification of Model Natural advantage of the network part of the model over the behavior part   - The number of measurement : n vs n(n-1) - More difficult to estimate a model of network-behavior co-evolution than to estimate a model of only network evolution Model specification determines the power of detecting selection and influence through the model.

Example: Dynamics of Friendship and Delinquency Network influence processes playing a role in the spread of delinquency through the group (the classroom) - Social influence with respect to delinquency  - Delinquency similarity for the network dynamics - Average similarity for the behavior dynamics  

Example: Dynamics of Friendship and Delinquency

Cross-sectional and Longitudinal Modeling For cross-sectional modeling, the ERGM can be best understood as a model of a process in equilibrium The advantage of longitudinal over cross-sectional modeling : the parameter estimates provide a model for the rules governing the dynamic change in the network, which often are better reflections of social rules and regularities than what can be derived from a single cross-sectional observation.