# The statistical analysis of personal network data Part I: Cross-sectional analysis Part II: Dynamic analysis.

## Presentation on theme: "The statistical analysis of personal network data Part I: Cross-sectional analysis Part II: Dynamic analysis."— Presentation transcript:

The statistical analysis of personal network data Part I: Cross-sectional analysis Part II: Dynamic analysis

A word about quantitative and qualitative approaches Quantitative and qualitative approaches play complementary roles in personal network analysis  A qualitative pilot study can help to identify important predictors / Qualitative analyses can provide insights into the sources of error/ temporal instability  Quantitative analyses are crucial to determine the statistical effect of characteristics / Individuals do not know how for example their own constant characteristics influence their network.

In summary, types of information collected with Egonet: Information about the respondent (ego; e.g., age, sex, nationality) Information about the associates (alters) to whom ego is connected (e.g., age, sex, nationality) Information about the ego-alter pairs (e.g., closeness, frequency and or means of contact, time of knowing, geographic distance, whether they discuss a certain topic, type of relation – e.g., family, friend, neighbour, workmate -) Information about the relations among alters as perceived by ego (simply whether they are related or not, or strong/weak/no relation)

The statistical analysis of personal versus sociocentric networks: what are the differences? Whereas sociocentric network researchers often (yet not always) concentrate on a single network, personal network researchers typically investigate a sample of networks. The dependency structure of sociocentric networks is complex, therefore leading to the need of specialized social network software, but personal network researchers, as they often hardly use the data on alter-alter relations*, have a simpler dependency structure...

Personal network data have a “multilevel structure” E.g.: sample of 20 respondents, for each respondent, we collected data of 45 alters, so we have in total a collection of 900 dyads ego alter

Three types of analysis have been used in past research Type I: Aggregated analysis Type II: Disaggregated analysis (not okay, forget about it quickly!) Type III: Multilevel analysis

Type 1: Aggregated analysis  First, aggregate all information to the ego-level: Compositional variables (aggregated characteristics of alters or ego-alter relations): e.g., percentage of women, average age of the alters, average time of knowing, average closeness Structural variables (aggregated characteristics of alter-alter relations): e.g., network size, density of the network, betweenness, number of isolates, cliques  Then use standard statistical procedures to e.g.: Describe the network composition or structure or compare them across populations Explain the networks (network as a dependent variable) Relate the networks to some variable of interest (network as an explanatory variable)  Statistically correct provided that you are aware of your level of analysis

Network ANetwork BNetwork C F 1.0M 0.5F 0.5M 0.5F 0.5M 3.0 F 2.0M 0.5F 1.0M 1.0F 0.5M 4.0 M 1.0F 1.5M 1.5F 1.0 M 1.0F 2.0M 2.0F 1.0 M 1.0F 2.0M 2.0F 1.0 M 1.5F 1.5 M 1.5F 1.5 M 2.0F 2.0 20 % female50 % female80% female Av. tie strength 1.2Av. tie strength 1.4Av. tie strength 1.6 Example: Effect at network level cannot be interpreted at tie level

Network ANetwork BNetwork C F 1.0M 0.5F 0.5M 0.5F 0.5M 3.0 F 2.0M 0.5F 1.0M 1.0F 0.5M 4.0 M 1.0F 1.5M 1.5F 1.0 M 1.0F 2.0M 2.0F 1.0 M 1.0F 2.0M 2.0F 1.0 M 1.5F 1.5 M 1.5F 1.5 M 2.0F 2.0 20 % female50 % female80% female Av. tie strength 1.2Av. tie strength 1.4Av. tie strength 1.6 At tie level: 50% female, 50% male, av. tie strength women 1.3, av. tie strength men 1.5 Example: Effect at network level cannot be interpreted at tie level

Type 2: Disaggregate analysis Disaggregated analysis of dyadic relations (e.g., run an linear regression analysis on the 900 alters) is statistically not correct even though it has been done (e.g. Wellman et al., 1997, Suitor et al., 1997)  Observations of alters are not statistically independent as is assumed by standard statistical procedures  Standard errors are underestimated, and consequently significance is overestimated

Type 3: Multilevel analysis Multilevel analysis of dyadic relations  Multilevel analysis is a generalization of linear regression, where the variance in outcome variables can be analyzed at multiple hierarchical levels. In our case, alters (level 1) are nested within ego’s / networks (level 2), hence variance is decomposed in variance between and within networks.  Software: e.g., MLwiN, HLM, VarCL  Dependent variable: Some characteristic of the dyadic relation (e.g., strength of tie) - Networks as the dependent variables. Note: Special multilevel models have been developed for discrete dependent variables.  Explanatory variables can be (among others): characteristics of ego (level 2), characteristics of alters (level 1), characteristics of the ego-alter pairs (level 1).

See for a good article about the possibilities of multilevel analysis of personal networks (incl. a quick comparison with aggregated and disaggregated types of analysis): Van Duijn, M. A. J., Van Busschbach, J. T., & Snijders, T. A. B. (1999). Multilevel analysis of personal networks as dependent variables. Social Networks, 21, 187-209.

In summary, cross-sectional analysis... Unit of analysisFocus of analysis Content A tieMultilevel analysis A personal networkAggregated analysis The two types of analysis, even when focusing on the same variable, address different types of questions: □ Multilevel analysis: e.g., what predicts the strength of ties? □ Aggregated analysis: e.g., what predicts the average strength of ties in personal networks?

Illustration of type I: Aggregate analysis The case of migrants in Spain We collected information of about 300 migrants in Catalonia with Egonet (in 2004-2005), from four countries of origin For each respondent, information was collected about:  Ego (country of origin, years of residence in Spain, sex, age, marital status, level of education, etc.)  Alters (country of origin, country of living, etc.)  Ego-alter pairs (closeness, tie strength, type of relation, etc.)  Relations among alters

Illustration: The case of migrants in Spain Our research questions were:  Can we distinguish different types of personal networks (profiles) among migrants?  Can the type of personal network be predicted by the years of residence of a migrant?  If so, do years of residence still predict network profiles when controlled for other important background characteristics?

Method For each personal network (excluding ego), we first calculated compositional and structural characteristics (aggregate level) Then, we used the following statistical procedures to analyse the 286 valid cases:  K-means cluster analysis based on various network characteristics (see next slide), to identify homogeneous groups of networks (“network profiles”)  ANOVA to see whether profiles differ in years of residence  Multinomial logistic regression to predict profile membership from years of residence controlled for background variables age, sex, country of origin, employment

K-means cluster analysis (SPSS) Based on the network variables (all standardized) :  1. Proportion of alters whose country of origin is Spain  2. Proportion of fellow migrants  3. Density  4. Network betweenness centralization  5. Number of clusters (“subgroups”) within the network  6. Subgroup homogeneity regarding living in Spain  7. Average frequency of contact (7-point scale)  8. Average closeness (5-point scale)  9. Proportion of family in the network

Results cluster analysis Five-cluster solution was best interpretable and reasonably balanced Cluster sizes:  Profile 1, “the scarce network”: N = 54  Profile 2, “the dense family network”: N = 28  Profile 3, “the multiple subgroups network”: N = 73  Profile 4, “the two worlds connected network”: N = 75  Profile 5, “the embedded network”: N = 50 Characteristics that most contributed to the cluster partition are:  density  homogeneity of the subgroups regarding living in Spain  percentage of Spanish in the network

Description of profiles ScarceDense family Multiple subgrps 2worlds connect. Embed- ded % Spanish89261649 % migrants1720483529 N subgroups (sg)2¼13¼1¼1½ Homogeneity sg.high lowhigh Density.28.76.16.36.30 Betweennesshighlowhighmiddlehigh Freq. contact 1/ 3week 3/ month 2/ month 1/ week Closenesshighmiddlelowhighmiddle % family3254224028

Profile 1. Scarce network Color: country of origin (white = foreign, black = Spain); Size: country of living (large = Spain, small = other country)

Description of profiles ScarceDense family Multiple subgrps 2worlds connect. Embed- ded % Spanish89261649 % migrants1720483529 N subgroups (sg)2¼13¼1¼1½ Homogeneity sg.high lowhigh Density.28.76.16.36.30 Betweennesshighlowhighmiddlehigh Freq. contact 1/ 3week 3/ month 2/ month 1/ week Closenesshighmiddlelowhighmiddle % family3254224028

Profile 2. Dense family network Color: country of origin (white = foreign, black = Spain); Size: country of living (large = Spain, small = other country)

Description of profiles ScarceDense family Multiple subgrps 2worlds connect. Embed- ded % Spanish89261649 % migrants1720483529 N subgroups (sg)2¼13¼1¼1½ Homogeneity sg.high lowhigh Density.28.76.16.36.30 Betweennesshighlowhighmiddlehigh Freq. contact 1/ 3week 3/ month 2/ month 1/ week Closenesshighmiddlelowhighmiddle % family3254224028

Profile 3: Multiple subgroups network Color: country of origin (white = foreign, black = Spain); Size: country of living (large = Spain, small = other country)

Description of profiles ScarceDense family Multiple subgrps 2worlds connect. Embed- ded % Spanish89261649 % migrants1720483529 N subgroups (sg)2¼13¼1¼1½ Homogeneity sg.high lowhigh Density.28.76.16.36.30 Betweennesshighlowhighmiddlehigh Freq. contact 1/ 3week 3/ month 2/ month 1/ week Closenesshighmiddlelowhighmiddle % family3254224028

Profile 4: Two worlds connected Color: country of origin (white = foreign, black = Spain); Size: country of living (large = Spain, small = other country)

Description of profiles ScarceDense family Multiple subgrps 2worlds connect. Embed- ded % Spanish89261649 % migrants1720483529 N subgroups (sg)2¼13¼1¼1½ Homogeneity sg.high lowhigh Density.28.76.16.36.30 Betweennesshighlowhighmiddlehigh Freq. contact 1/ 3week 3/ month 2/ month 1/ week Closenesshighmiddlelowhighmiddle % family3254224028

Profile 5: Embedded network Color: country of origin (white = foreign, black = Spain); Size: country of living (large = Spain, small = other country)

Is the partition related to years of residence? (ANOVA in SPSS) Overall: F (4, 2.67) = 6.634, p <.001 Per profile: There are two homogeneous subsets that differ significantly in years of residence: Profiles 1 and 2, versus profiles 3, 4, and 5.

Is the partition also related to years of residence when controlled for background characteristics? Multinominal logistic regression (SPSS) Age and employment status did not have significant effects Sex and country of origin, however, influenced profile membership significantly: e.g., Senegambians had a higher probability to have a “dense family network” than others. However, even controlled for these background characteristics, years of residence still predicts cluster membership.

Conclusion of our illustration The network profiles give valuable information about adaptation to a host country The scarce network and the dense family network seem “transitional networks”, whereas the other three seem more settled.

But... In order to investigate whether the networks of migrants really follow a certain pattern of change (or multiple patterns depending on for example country of origin or entry situation), we need a longitudinal model.

... and what about the analysis of alter- alter relations? Most researchers are only interested in alter-alter relations to say something about the structure of personal networks of respondents:  Use structural measures (density, betweenness, number of cliques etc.) in an aggregated analysis  Apply triad census analysis (Kalish & Robins, 2006) If you’re interested in predicting who is related to whom (among the alters):  Specify Exponential Random Graph Model (ERGM) for each network and then run a meta-analysis over the results (cf., Lubbers, 2003; Lubbers & Snijders, 2007)

ERGMs ERGMs are available in, among others, the software StOCNET (where you can find SIENA as well) Dependent variable: whether alters are related or not Independent variables: characteristics of alters, the relation alters have with ego, the alter-alter pair, endogenous network characteristics such as transitivity (in the meta-analysis, characteristics of ego can be added as well) Type of analysis: Apply a common ERGM to each network (leaving ego out), then run a meta-analysis (cf. Lubbers, 2003; Snijders & Baerveldt, 2003; Lubbers & Snijders, 2007).

Part II. Dynamic analysis How do personal networks change over time? Data on personal networks are collected in two or more waves in a panel study

Interest in dynamic analysis  “Networks at one point in time are snapshots, the results of an untraceable history” (Snijders) E.g., personal communities in Toronto (Wellman et al.)  Changes following a focal life event (individual level) E.g., transition from high school to university (Degenne & Lebeaux, 2005); childbearing, moving, return to school in midlife (Suitor & Keeton, 1997); retirement (Van Tilburg, 1992); marriage (Kalmijn et al., 2003); divorce (Terhell, Broese Van Groenou, & Van Tilburg, 2007); widowhood (Morgan, Neal, & Carder, 2000); migration (Molina et al.)  Broader studies of social change: Social and cultural changes in countries with dramatic institutional changes E.g., post-communism in Finland, Russia (Lonkila, 1998), and Eastern Germany (Völker & Flap, 1995)

Types of dynamic personal network research (networks as dependent variables) Feld et. al. (2007), Field Methods 19, 218-236: Level of analysis Type of change Existence of tiesNature of ties that exist A tieType 1 Type 2 A personal network Type 3Type 4

Types of dynamic personal network research Feld et. al. (2007), Field Methods 19, 218-236: Level of analysis Type of change Existence of tiesNature of ties that exist A tieWhich ties come and go Type 2 A personal network Type 3Type 4

Types of dynamic personal network research Feld et. al. (2007), Field Methods 19, 218-236: Level of analysis Type of change Existence of tiesNature of ties that exist A tieType 1How characteristics of ties change A personal network Type 3Type 4

Types of dynamic personal network research Feld et. al. (2007), Field Methods 19, 218-236: Level of analysis Type of change Existence of tiesNature of ties that exist A tieType 1Type 2............... A personal network Expansion and contraction of networks Type 4

Types of dynamic personal network research Feld et. al. (2007), Field Methods 19, 218-236: Level of analysis Type of change Existence of tiesNature of ties that exist A tieType 1Type 2............... A personal network Type 3Change in overall characteristics of the networks

Types of dynamic personal network research Feld et. al. (2007), Field Methods 19, 218-236: Level of analysis Type of change Existence of tiesNature of ties that exist A tieWhich ties come and go How characteristics of tie change A personal network Expansion and contraction of networks Change in the overall characteristics of the networks

Illustration: The case of migrants in Spain Migrants in Catalonia (Barcelona, Vic, Girona).  We collected information about the personal networks of about 300 migrants (in 2004-2005).  Sample of 90 individuals for the second wave (1,5 - 2 years later on average).  Questionnaire at t2 identical to t1, but supplemented with queries about the changes, such as about alters who disappeared from the network  For the present illustration, we are focusing on Argentinean migrants only (part of the interviews N=22).

Type 1: Persistence of ties with alters across time Dependent variable: whether a tie persists or not to a subsequent time (dichotomous) Explanatory variables: characteristics of ego, alter, the ego-alter pair, and the situation, especially in combination with the initial characteristics of the relationship Type of analysis: Logistic multilevel analysis

Illustration type 1: The case of migrants in Spain Cases: 900 alters nested within 20 respondents Descriptive: How persistent are ties over time?  53% of these alters were again nominated in Wave 2 (N = 473), whereas 47% of the nominations was not repeated (N = 427). Explanatory: What predicts the persistence of ties over time?  Logistic multilevel analysis (see Table 1)

Table 1. Regression coefficients and standard errors (between brackets) of the logistic multilevel regression model predicting persistence of ties (N = 900). PredictorPersistence of ties Model 1Model 2 Constant 0.315 (0.706)-1.550 (.591) Characteristics ego Age-0.005 (0.019) Sex (i.e., ego is a man)-0.396 (0.238) Never married-0.212 (0.283) Years of residence 0.053 (0.098) 0.208 (0.121) Characteristics alter / ego-alter pair Frequency of contact 0.341 (0.053)* Closeness 0.519 (0.082)* Time alter and ego know each other 0.074 (0.035)* Same sex 0.098 (0.156) Alter is a family member 0.815 (0.229)* Alter is Spanish 1.511 (0.619)* Interact. Spanish × years of residence-0.406 (0.154)*

Additionally: Differences between dissolved and new ties Are the new ties qualitatively better than the broken ones? Alters newly nominated in Wave 2 were somewhat  frequently contacted (3.2 versus 2.8 on frequency of contact scale, t = 5.32, df = 888, p <.001), and somewhat  closer (2.9 versus 2.4 on closeness, t = 3.70, df = 888, p <.001) than the alters who were not nominated again in Wave 2. Furthermore, new relations were somewhat more often family members (18%) than relations that were broken (12%; χ 2 = 6.03, df = 1, p <.05). Involution?

Type 2: Changes in characteristics of persistent ties across time Dependent variable: change in some characteristic of the relationship (e.g., change in strength of tie) Explanatory variables: characteristics of ego, alter, the ego-alter pair, and the situation, especially in combination with the initial characteristics of the relationship Type of analysis: Multilevel analysis

Illustration Type 2: The case of migrants in Spain Cases: 473 persistent ties Descriptive:  There was a fair amount of change in frequency of contact (Mt1 = 3.50, Mt2 = 2.94; t = 8.231, df = 472, p <.05) and less change in closeness in stable ties (Mt1 = 3.68, Mt2 = 3.87; t = -4.065, df = 472, p <.05) Explanatory:  Multilevel analysis (see Table 2).

Table 2. Regression coefficients and standard errors (between brackets) of the multilevel regression model predicting changes in frequency of contact and closeness in stable ties (N = 473). Predictor Amount of change in frequency of contact Amount of change in closeness Constant-.081 (.599)1.249 (.359) Characteristics ego Age.034 (.008)*-.002 (.009) Sex (i.e., ego is a man).306 (.109)*.151 (.118) Never married.371 (.116)*-.103 (.135) Years of residence-.144 (.041)* -.084 (.047) Characteristics alter / ego-alter pair Time alter and ego know each other -.043 (.019)*-.022 (.013) Same sex-.050 (.103)-.118 (.071) Alter is a family member-.220 (.125) -.103 (.087) Alter is Spanish -.004 (.139).063 (.099) * p <.05

Type 3: Changes in the size of the network across time Dependent variable: change in number of ties in the personal network Explanatory variables: characteristics of ego, of the set of alters, and the situation, especially in combination with the initial characteristics of the network Type of analysis: Regression analysis

Illustration type 3: The case of migrants in Spain The size of the network was fixed at 45 alters in both waves, so this type of analysis cannot be illustrated with our data.

Type 4: Changes in overall network characteristics across time Dependent variable: change in compositional or structural variable (e.g., percentage of alters with higher education, density of the network) Explanatory variables: characteristics of ego, of the set of alters, and the situation, especially in combination with the initial characteristics of the network Type of analysis: Regression analysis

Illustration type 4: The case of migrants in Spain Cases: 22 respondents. The network stability of the 22 respondents was on average 53% (SD = 13.6), and varied between 29% and 76% among respondents. How does the composition and structure of the networks (the stable and unstable part together) change over time?  Descriptive: Overall, the network characteristics hardly changed over time (Table 3). The only characteristics that differed significantly between Wave 1 and 2 were average closeness and betweenness, both of which increased slightly over the years.  Explanatory: These changes could not be predicted by ego characteristics (using a regression analysis at ego level); the most important predictor of the change was the variable at t1 (regression to the mean).

Table 3. Means and standard deviations of the compositional variables of the personal networks at t1 and t2 (N = 22), correlations between the two waves, and t-test of differences between the two waves. VariableWave 1Wave 2rt MSDM Percentage of Spanish27.514.931.417.1.77*-1.674 Percentage living in Spain59.421.461.917.0.72*- 0.773 Average closeness 2.1 0.3 2.3 0.3.37- 2.755* Average freq. of contact 2.9 0.8 3.0 0.7.59*- 0.519 Percentage of family22.012.624.6 9.1.66*-1.246 Density0.190.110.170.06.59* 0.851 Betweenness22.812.332.215.7.06- 2.278* * p <.05

Conclusions from the illustration … There is quite some instability in the personal relations of Argentinean immigrants in Catalonia, most importantly in their peripheral relations Relational characteristics predict the persistence of ties, whereas demographic characteristics of ego affect the flux and flow within their persistent ties These quantitative analyses suggest that important changes in the number of active contacts and/or changes in ties (from 30-70%) are compatible with overall stability in network composition.

Further analyses We will investigate (based on all 90 respondents) whether persons with different network profiles at t1 have different patterns of changes in their networks, indicating different ways of assimilation to Spain.

So what about the dynamics of alter-alter relations?... Let’s propose a type 5?

Type 5: Changes in ties among alters across time Dependent variable: whether alters make new ties or break existing ties with other alters across time Independent variables: characteristics of alters, the relation alters have with ego, the alter-alter pair, endogenous network characteristics such as transitivity (in the meta-analysis, characteristics of ego can be added as well) Type of analysis: Apply a common SIENA model to each network (leaving ego out), then run a meta- analysis (cf. Lubbers, 2003; Snijders & Baerveldt, 2003; Lubbers & Snijders, 2007). A multilevel version of SIENA is on the agenda.

SIENA makes assumptions which seem to be violated in personal networks It is assumed that people act strategically/rationally within the network, so the network should make sense to them and they should know who are the alters Thoughts on strategical behavior and robustness:  Strategical behaviour among alters also occurs in personal networks, e.g., “befriend the friends of friends”.  In sociocentric networks, people are also influenced by others outside the networks (e.g. out-of-school friends).  In large sociocentric networks (e.g., an organisation), people do not know all alters either.

Illustration of type 5: Changes in ties among alters across time We are currently applying SIENA to each case In a meta-analysis, we can then investigate whether for example a significant tendency of transitivity among alters is related to more stability in the relations between ego and the alters

Case study: Norma’s network at t1

Case study: Norma’s network at t2

Case study: Norma’s network at t2 (new contacts depicted in red)

Case study: SIENA analysis of Norma’s network In Norma’s network, there are 62 actors (28 stable actors, 17 who come and 17 who go). Of the 378 stable ties, 292 are not related at any moment, 64 are related at both moments, 15 only at t1 and 7 only at t2. Statistical results: The following effects were significant (apart from degree): Similarity in the frequency of contact between alters: If two alters had about the same frequency of contact with ego, they had a higher probability of having a relation themselves. Transitivity: If A and B are related, and B and C as well, then it is likely that A and C also become related. (but note that A and C already had a transitive relation via the invisible ego…!). Alter is family of ego or not: The family members of ego have a lower tendency to contact other alters as the other network members.

Sources of change in (personal) networks Unreliability due to measurement error Inherent instability Systemic change External change Leik & Chalkley (1997), Social Networks 19, 63-74

Sources of change in (personal) networks Unreliability due to measurement error Inherent instability Systemic change External change Researchers should consider the potential impact of measurement error and inherent instability on the substantive conclusions! E.g., plan a pilot study, supplement with qualitative analyses, calculate test-retest reliability of network and scales of closeness etc. Error sources

Conclusion Multiple statistical methods for personal network research, depending on your research interest Combining several methods probably gives greatest insight

Thanks! My e-mail: MirandaJessica.Lubbers@uab.es

Download ppt "The statistical analysis of personal network data Part I: Cross-sectional analysis Part II: Dynamic analysis."

Similar presentations