Presentation on theme: "Multilevel models for family data A presentation to the Research Methods Festival Oxford, July 2004. Tom OConnor Jon Rasbash Work conducted for the ESRC."— Presentation transcript:
Multilevel models for family data A presentation to the Research Methods Festival Oxford, July 2004. Tom OConnor Jon Rasbash Work conducted for the ESRC research methods programme project: Methodologies for Studying Families and Family Effects: the systematic assessment of research designs and data analytic strategies
The presentation looks at three applications of multilevel modelling to family data 1.Using multilevel models to explore the determinants of differential parental treatment of children. 2. Extending multilevel models to include genetic effects. 3. Applying multilevel models developed to handle social network data to family relationship data.
Application 1 Understanding the sources of differential parenting: the role of child and family level effects Jenny Jenkins, Jon Rasbash and Tom OConnor Developmental Psychology 2003(1) 99-113
Background Recent studies in developmental psychology and behavioural genetics emphasise non-shared environment is much more important in explaining childrens adjustment than shared environment has led to a focus on non-shared environment.(Plomin et al, 1994; Turkheimer&Waldron, 2000) Has this meant that we have ignored the role of the shared family context both empirically and conceptually?
Background One key aspect of the non-shared environment that has been investigated is differential parental treatment of siblings. Differential treatment predicts differences in sibling adjustment What are the sources of differential treatment? Child specific/non-shared: age, temperament, biological relatedness Can family level shared environmental factors influence differential treatment?
Parents have a finite amount of resources in terms of time, attention, patience and support to give their children. In families in which most of these resources are devoted to coping with economic stress, depression and/or marital conflict, parents may become less consciously or intentionally equitable and more driven by preferences or child characteristics in their childrearing efforts. Henderson et al 1996. This is the hypothesis we wish to test. We operationalised the stress/resources hypothesis using four contextual variables: socioeconomic status, single parenthood, large family size, and marital conflict The Stress/Resources Hypothesis Do family contexts(shared environment) increase or decrease the extent to which children within the same family are treated differently?
Previous analyses, in the literature exploring the sources of differential parental treatment ask mother to rate two siblings in terms of the treatment(positive or negative) they give to each child. The difference between these two treatment scores is then analysed. This approach has several major limitations… How differential parental treatment has been analysed
The sibling pair difference difference model, for exploring determinants of differential parenting Where y 1i and y 2i are parental ratings for siblings 1 and 2 in family I x 1i is a family level variable for example family ses Problems One measurement per family makes it impossible to separate shared and non-shared random effects. All information about magnitude of response is lost (2,4) are the same as (22,24) It is not possible to introduce level 1(non-shared) variables since the data has been aggregated to level 2. Family sizes larger than two can not be handled.
With a multilevel model… Where y ij is the jth mothers rating of her treatment of her ith child x 1ij are child level(non-shared variables), x 2j are child level(shared variables) u j and e ij are family and child(shared and non-shared environment) random effects. Note that the level 1 variance is now a measure of differential parenting
Advantages of the multilevel approach Can handle more than two kids per family Unconfounds family and child allowing estimation of family and child level fixed and random effects Can model parenting level and differential parenting in the same model.
Overall Survey Design National Longitudinal Survey of Children and Youth (NLSCY) Statistics Canada Survey, representative sample of children across the provinces Nested design includes up to 4 children per family PMK respondent 4-11 year old children Criteria: another sibling in the age range, be living with at least one biological parent, 4 years of age or older 8, 474 children 3, 860 families 4 child =60, 3 child=630, 2 child=3157
Measures of parental treatment of child Derived form factor analyses.. PMK report of positive parenting: frequency of praise of child, talk or play focusing on child, activities enjoyed together =.81 PMK report of negative parenting: frequency of disapproval, annoyance, anger, mood related punishment =.71 Will talk today about positive parenting PMK is parent most known to the child.
Child specific factors Age Gender Child position in family Negative emotionality Biological relatedness to father and mother Family context factors Socioeconomic status Family size Single parent status Marital dissatisfaction
Model 1: Null Model The base line estimate of differential parenting is 3.8. We can now add further shared and non-shared explanatory variables and judge their effect on differential parenting by the reduction in the level 1 variance.
positive parenting Child level predictors Strongest predictor of positive parenting is age. Younger siblings get more attention. This relationship is moderated by family membership. Non-bio mother and Non_bio father reduce positive parenting Oldest sibling > youngest sibling > middle siblings Family level predictors Household SES increases positive parenting Marital dissatisfaction, increasing family size, mixed or all girl sib- ships all decrease positive parenting Lone parenthood has no effect.
Differential parenting Modelling age reduced the level 1 variance (our measure of differential parenting) from 3.8 to 2.3, a reduction of 40%. Other explanatory variables both child specific and family(shared environment) provide no significant reduction in the level 1 variation. Does this mean that there is no evidence to support the stress/resources hypothesis.
Testing the stress/resource hypothesis The mean and the variance are modelled simultaneously. So far we have modelled the mean in terms of shared environment but not the variance. We can elaborate model 2 by allowing the level 1 variance to be a function of the family level variables household socioeconomic status, large family size, and marital conflict. That is Reduction in the deviance with 7df is 78.
Conclusion We have found strong support for the stress/resources hypothesis. That is although differential parenting is a child specific factor that drives differential adjustment, differential parenting itself is influenced by family as well as child specific factors. This challenges the current tendency in developmental psychology and behavioural genetics to focus on child specific factors. Multilevel models fitting complex level 1 variation need to be employed to uncover these relationships.
Application 2 Including Genetic Effects in Multilevel models
Background Recent involvement in applying multilevel models to family data, collaborating with developmental psychologists. They asked can we include genetic effects in these models? Long tradition of quantitative genetics, arguably begun with Fishers 1918 paper The correlation between relatives based on the supposition of Mendalian inheritance This work has been developed by others and applied in Animal and plant genetics, evolutionary biology, human genetics and behavioural genetics.
The basic multilevel model, for kids within families Given the standard independence assumptions of multilevel models : The covariance of two children(i 1 and i 2 )in the same family is
Extending the basic model to include genetic effects Where g ij is a genetic effect for the ith child in the jth family. For two individuals (i 1,i 2 ) BUT The genetic covariance of two individuals in the same family, is clearly not zero since there is a non-zero probability that they share the same genes. What is F? This where Fishers 1918 paper comes in.
A very little genetics background First remember, humans have 23 pairs of chromosomes. A gene is a sequence of DNA at given location(locus) on a chromosome. In a population there might be multiple different versions of gene. For example, with two versions of a gene denoted by A and little a. There are 3 possible genotypes : AA Aa aa (Note Aa is functional equivalent to aA) We can think of the genes conferring values on an individual for a trait.
Given… a number of (strong) assumptions : 1. A metric trait is influenced by a large number of genes at a large number of loci(effectively infinite) 2. The effects of the genes add-up within and across loci 3. The genes are transmitted independently from parents to progeny. 4. The population being studied is mating at random 5. The population being studied is in evolutionary equilibrium. That is gene frequencies are not changing across generations. 6. There is no correlation between genetic and environmental effects. Corrections to the theory exist for all these assumptions, but I fear they are seldom used(in BG), are often difficult to implement and have not been thoroughly evaluated.
Then.. With a lot of complicated argument and algebra, Fisher shows that Where r (i1,i2) is the relationship coefficient between two individuals and equals (0,0.125,0.25,0.5,1) for unrelated individuals, cousins, half-sibs, full sibs and mz twins respectively. Thus the greater the relationship between individuals the greater their genetic covariance and therefore their phenotypic covariance. An individuals g ij is the sum of the effects of all their genes. The variance of these g ij is the additive genetic variance( g 2 ). The size of the additive genetic variance compared to other environmental variances is often of interest.
Data example 277 full sib pairs, 109 half sib pairs, 130 unrelated pairs, 93 DZ twins and 99 MZ twins aged between 9 and 18 years. Analysis of depression scores : The total variance in the two models is effectively the same 0.275 in model 1 and 0.285 in model 2 In model 2, which includes genetic effects, 70% of the family level variation and 60% of child level variation are re- assigned to the genetic variance Like autocorrelation, time-series models except covariance decays as a function of genetic distance as opposed to distance in time between measurements. Can use the same estimation machinery. ParameterModel 1Model 2 FixedEst(se) Intercept0.008(0.017)0.02(0.017) Random Shared env0.086(0.011)0.018(0.017) Non-shared env0.198(0.011)0.069(0.010) Genetic-0.209(0.022) Deviance2165.882129.2
Adding covariates From the fixed effects we see that depression scores increase with child age, paternal and maternal negativity; girls and children in stepfamilies also have higher depression scores. The largest drop in the variance when these explanatory variables are introduced occurs in the genetic variance. Model 3 Fixed Est(se) Intercept-0.285(0.087) Age0.011(0.006) Mat_neg0.157(0.024) Pat_neg0.216(0.026) Girl0.158(0.028) Stepfam0.105(0.029) Random Shared env0.0035(0.014) Non-shared env0.70(0.096) Genetic0.148(0.020) Deviance1780.95
Why the drop in the genetic variance? The largest drop in the genetic variance occurs when paternal and maternal negativity are added to the model as covariates. Pike et al(1996) analyse the same data using a series of genetically calibrated bivariate structural equations models. Two of the models they consider are bivariate structural equations models for maternal negativity and depression and paternal negativity and depression. In each of these two models they find 15% of the genetic variance in depression is due to a shared genetic component with parental negativity. When we add paternal and maternal negativity to our model as fixed effects we are sweeping out any common genetic effects shared by parental negativity and adolescent depression. We are also taking account of any environmental correlations whereby sibling pairs of greater relatedness experience more similar parental treatment. Both these factors will reduce the remaining additive genetic variance in the model.
Complex variation and gene environment interactions Currently our model for the variance partitions the variance into three sources family, child and genetic. The model for the variance can be further elaborated to allow each of the three sources of variation to be modelled as functions of explanatory variables, where the variables may be measured at any level. That is
Gene environment interaction with paternal negativity We now elaborate model 3 to allow all three variances to be a function of paternal negativity. That is : (4)
Results from model 4 including the three extra parameters reduces the deviance by 19.5. This reduction is almost entirely driven by the gene environment interaction term, 1 (g). Removing the 1 (e) and 1 (u) terms from the model 4 results in a change in only 1.5 in the deviance. The significant coefficient constitutes a gene-environment interaction because it implies the genetic variance changes as a function of paternal negativity. Model 4 Fixed Est(se) Intercept-0.273(0.080) Age0.011(0.005) Mat_neg0.170(0.024) Pat_neg0.210(0.028) Girl0.159(0.027) Stepfam0.097(0.028) Random Shared env 0.0006(0.014) -0.017(0.019) Non-shared env 0.073(0.009) 0.0078(0.010) Genetic 0.155(0.021) 0.093(0.023) Deviance1740.42
Graphing the gene environment interaction One explanation of GXE interactions is in terms of conditional gene expression. Suppose we have a gene A which gets switched on when an individual is subject to persistent high levels of cortisol. If some of the population have the A gene and some dont then this genetic variation only manifests in individuals under persistent high levels of stress
Model Extensions The multilevel model with genetic effects is flexible and can be adapted to a variety of situations where population structures have further nested or crossed random classifications in addition to the standard behavioural genetics situation of children within families. For example, Time : repeated measures on kids within families Institutions: schools, hospitals Space : areas Multiple observers Complex example given in next section.
Application 3 Applying social network models to family relationship data-some preliminary work.
Background Basic unit of analysis are directional scores on dyads Eg amount of aggression from individual A to individual B and vice-versa. The same structures occur in social network analysis. Often in family studies we have data on how individuals relate or behave towards each other. Snijders and Kenny(1999) develop a cross-classified multilevel model for handling these structures. In this presentation we explore the use of these models to analyse family level data from the Non-Shared Environment Adolescent Development(NEAD) data set, Reiss et al(1994).
The NEAD data 2 wave longitudinal family study, designed for testing hypothesis about genetic and environmental effects 277 full-sib pairs, 109 half-sib pairs, 130 unrelated pairs, 93 DZ twins and 99 MZ twins, aged between 9 and 18 years Wave 2 followed 3 years after wave 1 and any families where the older sib was older than 18 were not followed up. In wave 2 : 150 full sib pairs, 58 half-sib pairs, 43 unrelated pairs 63 DZ twins and 72 MZ twins A wide range of self-report, parental-report and observer variables were collected. We focus here on the family wide directed relational behaviour data collected by observers.
The data collection and construction At each wave every family was given a topic to discuss. These sessions were observed during this process and frequencies of physical and verbal behaviours from each individual to every other individual were counted. From these basic count data positivity and negativity variables were constructed. Thus for each family(all families have 2 parents and 2 kids), for each response, we have 12 directed measures….
Directed response scores for a family c1 c2 c1 m c1 f c2 c1 c2 m c2 f m c1 m c2 m f f c1 f c2 f m m= mother, f=father, c1=child 1, c2=child 2 Snijders and Kenny use the terms: behaviours from actors to partners. The directed scores can be classified according to actors and partners and also according to dyad: c1 c2 c2 c1 : dyad 1 c1 m m c1 : dyad 2 c1 f f c1 : dyad 3 c2 m m c2 : dyad 4 c2 f f c2 : dyad 5 m f f m : dyad 6
The unit diagram for the structure is … Family f1… Dyad d1 d2 d3 d4 d5 d6 Actor c1 c2 m f Directed score c1 c2 c1 m c1 f c2 c1 c2 m c2 f m c1 m c2 m f f c1 f c2 f m Partner c1 c2 m f
A concern about the data : results for negativity ParameterWave 1 negativityW2 Negativity intercept2.85(0.018)2.73(0.018) Var(family)0.098(0.014)0.064(0.012) Var(actor)0.130(0.011)0.057(0.010) Var(partner)0.065(0.010)0.022(0.009) Cov(actor,partner)0.048(0.009)0.009(0.008) Var(Dyad)0.231(0.012)0.174((0.012) Var(directed score)0.167(0.005)0.119(0.005) Wave 1 has strong family, actor, partner and dyad effects. All the effects in wave 2 are weaker.
Possible causes of smaller variances in w2 There were fewer raters in w2 than w1 so the reduction in the variance between the waves may be due to the number or type of raters in the different waves such that the w2 raters are more reliable. No obvious dependency of actor, partner or dyad variances on actor age, partner age or mean dyad age, which given w2 individuals are all three years older may have accounted for smaller variances. We do not at the moment have rater identifications. If we did we could include them as a classification in the model and remove rater effects. At the moment there is a possibility of unmodelled rater effects biasing the results.
Actor/Partner and Reciprocity Correlations In addition to a breaking down variance in relationship quality into family, actor, partner, dyad and directed score components there are also 2 correlations of interest. The actor/partner correlation(behaviour across dyads): indicates the extent to which individuals who act negatively in their relationships across all the other family members also elicit a shared high amount of negativity from other family members. The reciprocity correlation(dyad specific behaviour): after having removed family, actor and partner effects, gives a measure of the correlation between as behaviour to b and bs behaviour to a.
Total variance 122113311441233224423443 12 212c 13 312c 14 41cc2c 23c 32c2c 24c 42ccc2c 34ccc 43c2c No directed score has the same actor and partner therefore ap does not contribute to the var(y i(j,k,l)m ) so : Contribution of ap on the family covariance structure, c = ap. Note no contributionon to the diagnol and therefore no contribution to var(y i(j,k,l)m ).
Correlations and Variance partition coefficients W1 negW2 negW1 posW2 pos Actor/partner correlation 0.520.250.270 Reciprocity Correlation 0.580.590.360.29 Var(family)0.140.150.090.10 Var(actor)0.190.130.47 Var(partner)0.090.050.020.00 Var(Dyad)0.330.400.150.12 Var(directed score) 0.240.270.250.30 The biggest effects for negativity are from the dyads indicating the dyad is an important structure for determining negativity in relationships. This leads to a high reciprocity correlation for negativity. For positivity 47% of the variation in relationship quality is attributable to actors. This means that people act in a very consistent way across dyads for positivity. The model can also be elaborated to allow differential actor and partner effects. For example, allowing separate actor and partner variances for children and parents or to allow different dyadic variances for marital, parent/child and sib dyads. The actor/partner correlations are less stable across waves than the reciprocity correlations. We need to think about why this is?
Including genetic effects in the model-previous work Bussel et al (1999) conduct a genetic analysis on a subset of the negativity measurements using a bivariate structural equation model to explore adolescent relationships to siblings and mothers. Their focus of interest is to what extent patterns of negative relating in the parent- child sub-system are replicated in the sib sub-system. They use 4 of the twelve directed measurements in their analysis. The first trait they consider is negativity to sibling, the c1 c2 and the c2 c1 measures. The second trait they consider is mothers negativity to adolescent, the m c1 and m c2 measurements. This second trait they regard as being a measurement of the childs ability to elicit negativity from the mother, that is a partner effect. In their conceptualisation both traits are measurements on the children and genetic correlations and cross-correlations can be estimated based on the relationship between the two children being measured. In their analysis they make no separation between actor and partner effects and dyad effects are also not included in the model. They find a large shared environment component of variation and moderate non- shared, additive genetic and non-additive genetic components of variation.
Including genetic effects in the multilevel model Where the actor effect for individual j in family m is divided into two parts and environmental effect, a jm and a genetic effect g jm,. Likewise the partner effect for individual k in family m is divided into two parts an environmental effect, p km and a genetic effect g km. The actor and partner effects represent different behaviours with separate genetic variances. The actor and partner genetic variances are further decomposed into additive and non-additive components.
Covariance structure The covariance between two relationship measurements, in family m, one with actor j 1 and partner k 1 in dyad l 1 and the other with actor j 2 and partner k 2 in dyad l 2 is
Expanding the term for the genetic covariance Where are the additive and dominance actor genetic variance components and the additive and dominance partner genetic variance components respectively. Relationship between individuals b and c Parent –offspring1 / 2 Half-sibs1 / 4 Full sibs, DZ1 / 21 / 4 MZ11 The values for the additive and dominance relationship coefficients are those given from standard population genetics theory :
Results Parametermodel 1model 2 intercept2.85(0.018) family variance0.099(0.015)0.096(0.015) actor variance0.130(0.011)0.055(0.015) partner variance0.064(0.010)0.014(0.013) actor,partner covariance0.047(0.01)0.045(0.010) dyad variance0.230(0.013)0.232(0.013) directed score variance0.167(0.005) additive actor genetic 0.0 dominance actor genetic 0.077(0.013) additive partner genetic 0.0 dominance partner genetic 0.049(0.011) -2log like17669.717595.0 Model 1 actor variance 0.130 is split into two parts in model 2 an environmental component 0.055 and a genetic component 0.077, (0.055+0.077=0.132). For partner effects we have 0.064, 0.014+0.049=0.063 Total variance 0.737 0.735 Additive genetic variances set to zero because estimated variance components were negative (but ns)
Results ctnd Individuals do have common actor and partner effects across all their relationships within a family. The actor effects are stronger than the partner effects, that is the actor variance components, are larger. Both an individuals propensity to act negatively in relationships and individuals propensity to elicit negativity in relationships have a genetic component There are family level factors that effect the quality of all the relationships in a family. For relationship negativity, the dyad is the single most important classification, 31% of the total variability in relationship quality is attributable to dyad level factors. Parametermodel 1model 2 intercept2.85(0.018) family variance0.099(0.015)0.096(0.015) actor variance0.130(0.011)0.055(0.015) partner variance0.064(0.010)0.014(0.013) actor,partner covariance0.047(0.01)0.045(0.010) dyad variance0.230(0.013)0.232(0.013) directed score variance0.167(0.005) additive actor genetic 0.0 dominance actor genetic 0.077(0.013) additive partner genetic 0.0 dominance partner genetic 0.049(0.011) -2log like17669.717595.0
Model extensions Of course we can fit covariates at the actor, partner, dyad and family level. Such as actor and partner role : mother,father, son, daughter, brother, sister Family : step or nuclear, family ses Dyad : biological or non-biological relationship We can also allow the variance components for different classifications to be a function of explanatory variables: Different actor and partner variances by role or different dyad variances by dyad type:marital, parent/child or sib Genetic variation to interact with environmental variables. Many of these effects are statistically significant and we are currently exploring the uses of this model.