Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to SAS® Proc Mixed

Similar presentations


Presentation on theme: "Introduction to SAS® Proc Mixed"— Presentation transcript:

1 Introduction to SAS® Proc Mixed
CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor ©CSCAR, 2010: Proc Mixed

2 Workshop Goals Introduce the Linear Mixed Model (LMM) and some key concepts (theory) Learn what types of data are appropriate for a LMM analysis Learn how to set up data for analysis using SAS Proc Mixed Learn how to set up Proc Mixed syntax to fit LMMs for different types of data Interpret output from Proc Mixed Get diagnostic plots from Proc Mixed to check LMM assumptions ©CSCAR, 2010: Proc Mixed

3 Lab Examples Randomized block design Two-level clustered data
Three-level clustered data Repeated measures data Longitudinal data

4 What is a Linear Mixed Model (LMM)?
A parametric linear model for a normally distributed response, appropriate for non-independent data Clustered data Responses for units in same cluster may be correlated. Repeated Measures / Longitudinal data Residuals for the same subject may be correlated Differs from ordinary linear regression and Anova, where we assume independent observations ©CSCAR, 2010: Proc Mixed

5 What is a Linear Mixed Model? (Cont)
Predictors may be Fixed Random Fixed predictors have fixed effects parameters and specify the mean structure Random effects are associated with individual subjects or clusters and determine the covariance structure Variances and covariances can differ by group Differs from general linear model where we assume constant variance ©CSCAR, 2010: Proc Mixed

6 Data Appropriate for a LMM Analysis
Clustered data Subjects are nested in clusters, such as classrooms, families, litters, neighborhoods Repeated Measures data Multiple measurements for the same subjects over time or another dimension Longitudinal data Multiple measures for the same subject over long period of time, Observations for the same cluster/subject are likely to be correlated (non-independent) ©CSCAR, 2010: Proc Mixed

7 General Linear Model vs Linear Mixed Model
The parameters in a GLM are β and σ2 Independent observations with constant variance The parameters in a LMM are β and the variances and covariances in D and Ri Variance-Covariance Matrix, captures non-independence ©CSCAR, 2010: Proc Mixed

8 Linear Mixed Model for the ith Subject
where β are fixed effects parameters ui are random variables, with a normal distribution and variance-covariance matrix D ϵi are random residuals, with a normal distribution and variance-covariance matrix Ri ui,, ϵi are independent ©CSCAR, 2010: Proc Mixed

9 Yi : Response for Subject i
Yi is a vector of Responses for the ith cluster/subject The ni responses for cluster/subject i are set up in long format, with each response on a separate row of data, along with all of the covariates (predictors) for that response. Each subject/cluster may have a different number of responses Yi is approximately normally distributed (actually the residuals must be normal) If residuals are not normally distributed, we may consider a transformation to improve normality ©CSCAR, 2010: Proc Mixed

10 Yi for Clustered Data There are ni units within cluster i Example 1
Response measured once for each unit in the cluster Number of units per cluster can vary Some clusters may have only one unit Example 1 Mathgain score measured once for each sampled student in 130 classrooms Number of sampled students per classroom can vary What is the cluster? What is the unit? ©CSCAR, 2010: Proc Mixed

11 Clustered Data Examples
First, schools are sampled Next, classrooms within schools are sampled Finally, students within classrooms are sampled Mathgain score for each student is measured What is the cluster? What is the unit? Example 3 Birth weights of litters of rat pups are measured More examples? ©CSCAR, 2010: Proc Mixed

12 Y1 : response vector for the 3 students in classid 160
Clustered Data Table Y1 : response vector for the 3 students in classid 160 ©CSCAR, 2010: Proc Mixed

13 Yi for Repeated Measures Data
The classic repeated measures design is multiple measures of a response over time Multiple measures for the same subject ni measures of the response for subject i ni can vary across subjects Data can vary over time, space, or other dimension/s Example 1 Insulin levels at 1 min, 5 min, 20 min, and 60 min after injection of a drug for diabetes What is the subject? What is the repeated measures factor? ©CSCAR, 2010: Proc Mixed

14 Repeated Measures Data Examples
Chemical concentration in 5 different brain regions in rats What is the response? What is the subject? What is the repeated measures factor? More examples? ©CSCAR, 2010: Proc Mixed

15 Repeated Measures Data Table
Y1 : response vector for animal R111097 ©CSCAR, 2010: Proc Mixed

16 Yi for Longitudinal Data
Longitudinal data are measures made on the same subject over a longer period of time ni measures of the response for a given subject Number per subject can vary due to attrition Example 1 Socialization score for autistic children measured at ages 2, 3, 5, 9, and 13 What is the subject? What is the time frame? Do you think attrition will be a problem? Other examples? ©CSCAR, 2010: Proc Mixed

17 Longitudinal Data Table
Y1 : response vector for subject 1 ©CSCAR, 2010: Proc Mixed

18 Fixed Predictors Fixed predictors can be categorical (factors) or continuous For factors, all levels of interest included Treatment Gender Levels of fixed factors can be defined to represent contrasts of interest High Dose vs. Control, Medium Dose vs. Control Female vs. Male Fixed continuous predictors can be included as linear, quadratic or other terms ©CSCAR, 2010: Proc Mixed

19 Fixed Predictor Examples
Age, Age2 Income Gender Drug Treatment Region Examples from your work… ©CSCAR, 2010: Proc Mixed

20 Xi : Design Matrix for Fixed Effects
Xi contains values of the fixed predictor variables (X variables) for subject i. e.g. Age, Sex, Treatment The X matrix can include continuous and/or indicator variables for categorical predictors Xi has one row for each of the ni observations for the ith subject, and one column for each of the predictors We implicitly include an intercept (a column of ones) for most models We do not need to have an intercept variable in the data ©CSCAR, 2010: Proc Mixed

21 X Matrix Example X Matrix X Matrix for subject 1 The X matrix is formed from variables in the dataset. We usually don’t include a variable for the intercept in the dataset. The intercept is included in the model by default by Proc Mixed. The X matrix variables must be present for each row of data for that observation to be included in the analysis ©CSCAR, 2010: Proc Mixed

22  : Fixed Effects Parameters
 are fixed-effects parameters or regression coefficients unknown fixed quantities  describe how the mean of Y depends on the predictor variables for an entire population or subpopulation of subjects The value of  does not vary across individual subjects We usually include an intercept ( 0) as one of the components of  ©CSCAR, 2010: Proc Mixed

23 Random Factors Random Factor: A classification variable
Random factors do not represent conditions chosen to meet the needs of the study, but arise from sampling a larger population Variation in the dependent variable across levels of a random factor can be estimated and assessed Results can be generalized to the greater population A random factor may have different random effects associated with it (e.g. random intercept and random slope) ©CSCAR, 2010: Proc Mixed

24 Random Factor Examples
Clustered Data: Classroom Hospital Neighborhood More examples… Repeated Measures/Longitudinal Data: Person (subject) More examples…. The random factor is the cluster in clustered data, and subject in repeated measures/longitudinal data ©CSCAR, 2010: Proc Mixed

25 ui : Random Effects ui are unobserved random variables (not parameters) ui are specific for the ith subject (random factor) in a LMM ui vary across clusters/subjects ui are random deviations in the relationships described by fixed effects ui are assumed to have a normal distribution mean=0 and variance-covariance matrix, D The parameters associated with the ui are the variances and covariances of these random variables ©CSCAR, 2010: Proc Mixed

26 Normal Distribution Refresher
-infinity < Y < infinity E(Y ) = μ = Mean, or Expected value balance point center of symmetric distribution Var(Y) = σ2 = Variance Measure of variability, or spread σ2 must be 0 or positive, >=0 σ is the standard deviation μ ©CSCAR, 2010: Proc Mixed

27 Normal Distribution Refresher II
Same , Different  Same , Different  ©CSCAR, 2010: Proc Mixed

28 Covariance Refresher Covariance (denoted by Y1,Y2) is a measure of how much two random variables change together. It can be positive or negative. Normal random variables with zero covariance are assumed to be independent ©CSCAR, 2010: Proc Mixed

29 Variance-Covariance Matrix
A variance-covariance matrix contains the variances and covariances between a set of random variables. The dimension of the var-covar matrix is the same as the number of random variables. If there is only one r.v., the dimension would be 1 by 1 (just a single value). If there are two r.v.s, the matrix would be 2 by 2 (or 2x2) The set of variances and covariances are called the covariance parameters. ©CSCAR, 2010: Proc Mixed

30 The D matrix D is the variance-covariance matrix for the random effects (ui) for subject i D contains the covariance parameters for the random effects The dimension of D is based on the number of random effects per subject, not the number of observations per subject If there is one random effect (e.g., a random intercept) per subject, D would be 1x1 If there are two random effects (e.g., a random intercept and random slope) per subject, D would be 2x2 ©CSCAR, 2010: Proc Mixed

31 Form of the D Matrix Variances of random effects are on the diagonal
Covariances between different random effects within the same subject are on the off-diagonal Symmetric, positive-definite matrix Variances must all be positive SAS calls this G. We use D and G interchangeably ©CSCAR, 2010: Proc Mixed

32 Two Common Structures for D
Variance components type=vc (Independent) Unstructured type=un Although there are many other possible structures for D, one of these two structures is almost always used ©CSCAR, 2010: Proc Mixed

33 Covariance Matrices How would you fill these in, if you had a random intercept and random slope per subject and 10 obs per subject? Scenario A: 2 intercepts=.28 2 slopes=.10 Scenario B: 2 intercepts=.28 2 slopes=.10 intercepts,slopes=-.01 ©CSCAR, 2010: Proc Mixed

34 Zi : Design Matrix for Random Effects
The Zi matrix can include both continuous and indicator variables for subject i Zi has one row for each observation for the ith subject Number of columns in Zi depends on the number of random effects in the model We often include a random intercept for each subject In a model with one random intercept per subject, Zi would have one column per subject In a model with a random intercept and random slope for each subject, Zi would have two columns per subject. ©CSCAR, 2010: Proc Mixed

35 Z Matrix Example Z Matrix for Subject 1 Z Matrix for Subject 3
We don’t include variables for Z in our dataset. Note that Zi has all zero values for other subjects. ©CSCAR, 2010: Proc Mixed

36 Random Residuals: εi The εi vector contains the residuals for the ith subject There is one value of εi for each observation for the ith subject We assume that the εi are normally distributed, with mean = 0 and variance-covariance matrix, Ri There are a large number of possible structures for Ri, some of which we will examine later For example, we can allow the variances of the residuals at different time points to differ. ©CSCAR, 2010: Proc Mixed

37 The Ri matrix Ri contains the variances and covariances of residuals for the same subject residual covariance parameters The dimension of Ri depends on the number of observations (ni) for subject i. For a subject with 5 repeated measures, the Ri matrix would be 5 X 5. For a subject with only one measure, the Ri matrix would be 1 X 1. The default assumption in Proc Mixed for the Ri matrix is that the variance of all residuals is the same and that the covariances are all zero. ©CSCAR, 2010: Proc Mixed

38 Form of the R Matrix The diagonal elements are variances of residuals for the same subject The off-diagonal elements are covariances between two residuals for the same subject Symmetric, positive-definite matrix The variances must all be > 0 ©CSCAR, 2010: Proc Mixed

39 Form of the R Matrix II Proc Mixed has many possibilities for the structure of the R matrix We will discuss some of these later in the workshop. ©CSCAR, 2010: Proc Mixed

40 Covariance Parameters
We estimate a set of covariance parameters , which are the variances and covariances for the D and R matrices For D we estimate D (variances and covariances of the random effects) For R we estimate R (variances and covariances of the random residuals) The number of covariance parameters that we estimate depends on the number of random effects, and the structure we specify for D and R ©CSCAR, 2010: Proc Mixed

41 Covariance Summary We use Proc Mixed to estimate the variance of random effects, and the covariance between pairs of random effects in a LMM. We also use Proc Mixed to estimate the variances and covariances of the random residuals in a LMM. We assume that the random effects and the random residuals are independent. ©CSCAR, 2010: Proc Mixed

42 Implied Marginal Model
A LMM uses random effects explicitly to model between-subject variance Subject-specific model Includes D matrix and R matrix Implied marginal model Marginal model that results from fitting a LMM The marginal variance-covariance matrix is called V V is derived from D and R We can get the distribution of the population mean using the implied marginal model ©CSCAR, 2010: Proc Mixed

43 Implied Marginal Distribution of Yi Based on a LMM
Vi, the marginal variance-covariance matrix, is derived from D and Ri ©CSCAR, 2010: Proc Mixed

44 Proc Mixed in SAS Is an appropriate tool to fit models for clustered or repeated measures / longitudinal data Allows users to fit LMMs with both fixed and random effects Accomodates models with a wide variety of correlation (covariance) structures Can be used when there are unequal numbers of observations per cluster/subject Can be used when there are unequal variances for different subgroups of observations Has a rich array of graphical and analytic tools to assess the fit of LMMs ©CSCAR, 2010: Proc Mixed

45 Data Structure for Proc Mixed
We structure the data in “long” form, so multiple observations for the same subject/cluster are on separate rows of data Each row contains all information specific to the cluster or subject Some variables vary across clusters/subjects, but are constant within a given cluster/subject, they will be repeated for all rows for the same subject/cluster In repeated measures, these are time-invariant Some variables change for different subjects within a cluster In repeated measures, these are time-varying ©CSCAR, 2010: Proc Mixed

46 Randomized Block Design
A block is a group of relatively homogeneous experimental units The use of blocks reduces variability (within-block variability should be low, between-block variability can be high) Individual blocks are independent Observations within a block are correlated Blocks are usually random factors They represent a random sample from a population We wish to make inferences to the population, not to the individual blocks Examples of blocks include batches, machines, plots, mice, people, clinics, and bananas (the next example) ©CSCAR, 2010: Proc Mixed

47 Randomized Block Design Banana Data
Lab Example 1 Randomized Block Design Banana Data

48 (Hypothetical) Banana Example
Purpose: To compare shelf life of bananas, when treated with three different food preservatives (A, B, C) Experimental material: 5 bananas Experimental design, bananas are blocks: Cut each banana into three pieces. Randomly assign one of the three preservatives to each piece ©CSCAR, 2010: Proc Mixed

49 Fixed and Random Factors
Treatment is a fixed factor; contrasts between treatments are of interest Bananas are a random sample from a population of bananas We want conclusions for the study to apply to the whole population of bananas, not just these particular bananas Banana is a random factor that will have random effects We will fit a LMM with fixed effects for treatment and a random effect for each banana ©CSCAR, 2010: Proc Mixed

50 Model for Banana Data Note: The D matrix is 1 by1, because we have only 1 random effect per banana where Yti = shelf life of banana i, treated with preservative t 0 = intercept t = fixed effect of treatment t, t = 1, 2, 3 bi = random effect (intercept) for banana i, εti = residual for banana i, treatment t We estimate five parameters, 0, 1, 2, σb2, σ2 3 = 0, set to zero restriction ©CSCAR, 2010: Proc Mixed

51 D matrix for Banana Data
The bi are random intercepts, one for each banana σ2b is the variance of the random banana intercepts, and captures the between-banana variance In this case (1 random effect per subj), D is 1 x 1 σ2b is the only random effects parameter we need to estimate The covariance of observations on the same banana depends on the variance of the random effects, σ2b ©CSCAR, 2010: Proc Mixed

52 Ri Matrix for Banana Data
There are 3 observations per banana The Ri matrix will be a 3 x 3 matrix for each banana We assume the default structure (σ2I) for Ri The residual variance is constant Var(εti) = σ2 ©CSCAR, 2010: Proc Mixed

53 ______________________  = treat --- 1 2 3 (Fixed Effects)
Model for each Banana y11 = 0 + 1 + b1 + ε11 y21 = 0 + 2 + b1 + ε 21 y31 = 0 + 3 + b1 + ε 31 y12 = 0 + 1 + b2 + ε 12 y22 = 0 + 2 + b2 + ε 22 y32 = 0 + 3 + b2 + ε 32 yti = 0 + t + bi + ε ti y15 = 0 + 1 + b5 + ε 15 y25 = 0 + 2 + b5 + ε 25 y35 = 0 + 3 + b5 + ε 35 ______________________  = treat 1 2 3 (Fixed Effects) b = banana --- b1 b2 b3 b4 b5 (Random Effects) Data (long form) y treat banana A B C A B C A B C ©CSCAR, 2010: Proc Mixed

54 Example 1: SAS Code data fruit; input shelf treat $ banana; cards;
proc mixed data = fruit; class treat banana; model shelf = treat / solution; random banana; run; ©CSCAR, 2010: Proc Mixed

55 Proc Mixed Syntax Class statement sets up categorical factors for both fixed and random effects Model statement specifies the fixed factors in the model Random statement specifies the random factors to be included in the model, and specifies the structure for the D matrix (called G matrix by SAS) Repeated statement specifies the structure of the R matrix of residual variances and covariances ©CSCAR, 2010: Proc Mixed

56 Example 1: Proc Mixed Syntax
Proc mixed data = fruit; class treat banana; model shelf = treat / solution; random banana; run; Note: Proc Mixed will automatically include a dummy variable for each level of a class variable. The highest level of the class variable is given a coefficient of 0 for the dummy variable by default. This makes the highest level the reference. ©CSCAR, 2010: Proc Mixed

57 Ex 1: Proc Mixed Output Part 1
The Mixed Procedure Model Information Data Set WORK.FRUIT Dependent Variable shelf Covariance Structure Variance Components Estimation Method REML Residual Variance Method Profile Fixed Effects SE Method Model-Based Degrees of Freedom Method Containment Class Level Information Class Levels Values treat A B C banana ©CSCAR, 2010: Proc Mixed

58 Ex 1: Proc Mixed Output (Cont)
Dimensions Covariance Parameters Columns in X Columns in Z Subjects Max Obs Per Subject Number of Observations Number of Observations Read Number of Observations Used Number of Observations Not Used Iteration History Iteration Evaluations Res Log Like Criterion Convergence criteria met. SAS assumes all obs are for the same subject, because we did not specify a subject in the random statement ©CSCAR, 2010: Proc Mixed

59 Ex 1: Proc Mixed Output (Cont)
Estimated between Banana variance:σb2 Intercept is estimated Mean for Treat C      Covariance Parameter Estimates Cov Parm Estimate banana Residual    Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) Solution for Fixed Effects Standard Effect treat Estimate Error DF t Value Pr > |t| Intercept <.0001 treat A treat B treat C Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F treat Estimated within banana variance: σ2 Estimated diff in mean of treat A vs. C Estimated diff in mean of Treat B vs. C Significance of Overall Treat Effect ©CSCAR, 2010: Proc Mixed

60 Ex 1: Interpreting Fixed Effects Estimates
The estimated effect of each treatment represents a contrast between that level of treatment and the last level The intercept represents the estimated mean for the last level of treatment The estimated shelf life for treatment C = 9.72 days The effect of treatment A = -0.28 Treatment A reduces shelf life by 0.28 days, as compared to treatment C (the reference group), p = The effect of treatment B = -0.22 Treatment B reduces shelf life by 0.22 days, as compared to treatment C,p = ©CSCAR, 2010: Proc Mixed

61 Ex 1: Fixed Effects Estimates (Cont)
We can substitute the estimated fixed effects parameters into the model equation to get the predicted mean for each treatment ©CSCAR, 2010: Proc Mixed

62 Ex 1: Covariance Parameter Estimates
There are two covariance parameters in this model, the estimated between-banana variance: And the estimated within-banana, or residual variance: ©CSCAR, 2010: Proc Mixed

63 Ex 1: Estimated G Matrix We can view G matrix (what we call the D matrix) by using the g option: random banana / g; The D matrix for each Banana is 1x1 Estimated G Matrix Row Effect banana Col1 Col2 Col3 Col4 Col5 1 banana 2 banana 3 banana 4 banana 5 banana ©CSCAR, 2010: Proc Mixed

64 Ex 1: Estimated Ri Matrix
We can view the estimated 3 x 3 Ri matrix for the three observations for the first banana by adding a repeated statement: repeated / subject=banana r; Estimated R Matrix for Banana 1 Row Col Col Col3 ©CSCAR, 2010: Proc Mixed

65 Estimated V Matrix for Subject 1
Ex 1: Estimated V matrix We can view the estimated V matrix of marginal variances and covariances for all bananas by using the v option in the random statement (note V is block-diagonal, with obs from different bananas being indep.): random banana / v; Estimated V Matrix for Subject 1 row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Col14 Col15 ©CSCAR, 2010: Proc Mixed

66 Ex 1: Calculation of V matrix from D and R
The V matrix, is derived from the D and R matrices The covariance, and hence correlation, among observations within the same banana is due to the between-banana variation If there is zero between-banana variation, there is no correlation among obs for the same banana ©CSCAR, 2010: Proc Mixed

67 Ex 1: Calculation of V Matrix (Cont)
We first illustrate these calculations for the ith banana We then show how these calculations work for the entire data set, assuming we have only two bananas in the study This can then be generalized to any number of bananas We will then have tools to help understand more complicated models ©CSCAR, 2010: Proc Mixed

68 Ex 1: Step 1 of Calculation of V Matrix for the ith banana
©CSCAR, 2010: Proc Mixed

69 Ex 1: Step 2 of Calculation of V Matrix
©CSCAR, 2010: Proc Mixed

70 Ex 1: Step 3 of Calculation of V Matrix
Covariance of obs on same banana is the between-banana variance Marginal variance of each obs is the between-banana variance plus within-banana variance ©CSCAR, 2010: Proc Mixed

71 Ex 1: Calculation of V Matrix for two bananas
©CSCAR, 2010: Proc Mixed

72 Ex 1: Calculation of V Matrix for two bananas (Cont)
©CSCAR, 2010: Proc Mixed

73 Ex 1: Calculation of V for two bananas (Cont)
The V matrix is block-diagonal Observations within the same banana are correlated Observations for different bananas are independent This can be extended for more bananas ©CSCAR, 2010: Proc Mixed

74 Ex 1: Intraclass Correlation ICC
The ICC estimates the correlation of observations within the same subject It is very high for this example Because it is based on variances, the ICC can only be positive or zero (more on this later) ©CSCAR, 2010: Proc Mixed

75 Ex 1: ICC for the Banana Data
You can get an estimate of the marginal correlation matrix by adding the vcorr option: random banana / v vcorr; Estimated V Correlation Matrix for Subject 1 row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Col14 Col15 ©CSCAR, 2010: Proc Mixed

76 Ex 1: Using Subject in the Random Statement
The subject option tells SAS the V matrix is block diagonal, allowing computationally efficient methods of estimation. random intercept / subject=banana; SAS will now say that there are 5 subjects, instead of one. With the subject option, if you ask SAS to print out the G, V, or Vcorr matrices, you will only get one block of the matrix, for a single subject ©CSCAR, 2010: Proc Mixed

77 Ex 1: Using Subject in the Random Statement (Cont)
Without the subject option, the V matrix will still be block diagonal, but SAS won’t know that in advance and will use less efficient methods of computation If you have a large sample size, using the subject option may be crucial for efficiency. ©CSCAR, 2010: Proc Mixed

78 Ex 1: Summary Random statement allows us to fit a model with correlated observations for the same banana Including the subject option is more efficient We get estimates of the between-banana variation, the within-banana variation and the intraclass correlation SAS will print estimates of the marginal variance-covariance matrices and the marginal correlation matrices for the model We estimate the fixed effects of treatment after adjusting for the random effects of banana ©CSCAR, 2010: Proc Mixed

79 Model-Building Strategies
Top-Down: Start with a well-defined mean structure Select a structure for the random effects Select a covariance structure for residuals Reduce the fixed effects in the model, removing non-significant effects Step-Up: Often used when fitting HLM models Starts with “unconditional” or means-only model Add fixed effects and random effects to the different levels of the model ©CSCAR, 2010: Proc Mixed

80 Estimation in LMMs Use either ML (Maximum Likelihood) or REML (Residual or Restricted Max Likelihood) to estimate covariance parameters in V Then use Generalized Least Squares (GLS) to estimate  ©CSCAR, 2010: Proc Mixed

81 REML Estimation REML is a way of estimating covariance parameters
Produces unbiased estimates of covariance parameters Takes into account loss of df resulting from estimating fixed effects Used when carrying out hypothesis tests about covariance parameters Less important to use REML when sample size is large ©CSCAR, 2010: Proc Mixed

82 ML Estimation Maximize a profile log likelihood function lML()
Nonlinear optimization Constraints applied to the covariance parameters in D and R Iterates to a solution ML estimation used for hypothesis tests (LRT) about fixed effects parameters,  ©CSCAR, 2010: Proc Mixed

83 Hypothesis tests We may want to test hypotheses about the fixed effects in  Test whether a particular fixed effect is zero, Compare the fixed effects of two (or more) treatments or groups Get an overall test of the fixed effects for a categorical predictor We may want to test hypotheses about the covariance parameters in  Is the variance of a particular random effect zero? Should we include covariances between random effects, or specify them to be zero? ©CSCAR, 2010: Proc Mixed

84 F-tests Often approximate in LMMs, except for balanced designs or partially balanced designs We do not base F-tests on sums of squares as in traditional ANOVA models Denominator df for F-tests may be estimated in a number of ways H0: Lβ = 0 HA: Lβ ≠ 0 ©CSCAR, 2010: Proc Mixed

85 Denominator df in F-tests
df method can be specified as an option in Model statement (ddfm= ) Default method for model with random statement is contain: ddfm=contain (SAS uses syntax rules to figure out correct error term for fixed effects) Satterthwaite has good small sample properties: ddfm=sat Kenward-Roger tries to correct for the fact that we do not know (and so must estimate) variance of random effects: ddfm=kr Between-within divides df into between and within components: ddfm=bw (default for repeated measures) With a large number of subjects, the method used doesn’t make much difference ©CSCAR, 2010: Proc Mixed

86 t-tests t-tests are also usually approximate in a LMM
Degrees of freedom (df) for the t-test may also be approximated as for F-test H0: β = 0 HA: β ≠ 0 ©CSCAR, 2010: Proc Mixed

87 Likelihood Ratio Tests (LRT)
Likelihood Ratio Tests compare the likelihood for a nested (reduced) model to that for a reference (full) model. One or more parameters the in the nested model are constrained (i.e., certain parameters may be set to zero) The df for the test are derived by subtracting the number of parameters in the nested model from the number in the reference model ©CSCAR, 2010: Proc Mixed

88 Likelihood Ratio Tests for Fixed Effects
Use Maximum Likelihood (ML) as the estimation method Fit the reference model Fit the nested model Subtract -2 log likelihood of reference model from that of nested model Calculate the p-value of the test, using appropriate df This can be done using SAS code, illustrated in Lab example 2 ©CSCAR, 2010: Proc Mixed

89 Likelihood Ratio Tests for Covariance Parameters
Use Restricted Maximum Likelihood (REML) estimation to get unbiased estimates of covariance parameters Fit the reference model Fit the nested model Calculate the p-value using the appropriate 2 distribution or mixture of 2 distributions, and appropriate df This can be done using SAS code, illustrated later ©CSCAR, 2010: Proc Mixed

90 Residual Diagnostics Assess model residuals for Normality
Constant variance Outliers Use histograms, normal q-q plots, residual vs. predicted plots, and other diagnostic plots Two basic kinds of residuals Conditional residuals / Studentized conditional residuals (conditional on random effects)-better for model diagnostics Unconditional residuals (not conditional on random effects)-not as good for model diagnostics We will use conditional / studentized conditional residuals for this workshop ©CSCAR, 2010: Proc Mixed

91 Conditional Residuals
Difference between the observed value and the conditional predicted value (conditional on random effects) May not be well as suited for verifying model assumptions and detecting outliers as studentized conditional residuals Variances may be different for different subgroups ©CSCAR, 2010: Proc Mixed

92 Studentized Conditional Residuals
Scaled residuals, where each conditional residual is divided by its estimated standard deviation Because residuals are scaled, different residuals for subgroups with unequal variances will have similar scales Two types of studentization: Internal studentization the observation itself is included in calculation of its standard deviation (studentized residuals) External studentization the observation itself is excluded when calculating its standard deviation (studentized deleted residuals) Well suited for verifying model assumptions and detecting outliers ©CSCAR, 2010: Proc Mixed

93 Influence Diagnostics
Can identify observations that are influential in estimation of β (fixed effects) or  (covariance parameters) Examine effect of omission of each observation (or cluster) on analysis of the entire data set Proc Mixed includes many ways to study influence diagnostics for LMMs Active area of research ©CSCAR, 2010: Proc Mixed

94 Hierarchical Data Structure
A nice way to visualize data sets appropriate for an LMM analysis, is to think about them in a hierarchical sense. This way of thinking about the data is largely due to the HLM program of Bryck and Raudenbush. Each level of the data represents a different degree of summarization. ©CSCAR, 2010: Proc Mixed

95 Clustered Data Hierarchy
Dependent variable measured once for each unit of analysis Units of analysis are nested within clusters of units Observations for units in same cluster may be correlated Students in classrooms Rat pups in litters Patients in clinics Students in classrooms and classrooms within schools ©CSCAR, 2010: Proc Mixed

96 Two-level Clustered Data Structure
(Rat Pup Data) Level 2 (Litters) Litter 1 Litter 2 Level 1 (Rat pups) Rat pup 1 Rat pup 2 Rat pup 1 Rat pup 3 Rat pup 2 Dependent variable is measured once for each rat pup ©CSCAR, 2010: Proc Mixed

97 Clustered Data Setup Data in long format Unit-specific information
One row for each unit within a cluster Unit-specific information Response variable Unit-specific covariates to be included in the model Cluster-specific information Cluster ID Cluster-specific covariates to be included in the model These values are repeated for each row for a cluster All rows with complete data will be used in fitting the model ©CSCAR, 2010: Proc Mixed

98 Two-Level Clustered Data Rat Pup Data
Lab Example 2 Two-Level Clustered Data Rat Pup Data

99 Rat pup data Setup for SAS
©CSCAR, 2010: Proc Mixed

100 Ex 2: Summary We can use various tests (F-tests, t-tests, and likelihood ratio tests) for the LMM that we fit. The appropriate test depends on the hypothesis that we are testing Model diagnostics in Proc Mixed are very extensive and provide helpful information about influential cases/clusters ©CSCAR, 2010: Proc Mixed

101 Three-Level Clustered Data Structure
School 1 Classroom 2 Classroom 1 Student 1 Student 2 Student 3 Level 3 (Clusters of Clusters) Level 2 (Clusters of Units) Level 1 (Units of Analysis) Dependent variable is measured once for each student ©CSCAR, 2010: Proc Mixed

102 Levels of Data: Level 1 The most detailed level of the data
Response is always measured at Level 1 For Clustered data set: Level 1 Represents the units of analysis (e.g., students in classroom) Unit-specific covariates (e.g., SES, minority status, sex of each child) are measured at Level 1 ©CSCAR, 2010: Proc Mixed

103 Levels of Data: Level 2 The next level of the hierarchy in the data
For a Clustered data set: Level 2 represents clusters of units (e.g., classrooms) Includes cluster-specific covariates (e.g., classroom size, teacher experience) ©CSCAR, 2010: Proc Mixed

104 Levels of Data: Level 3 The third level of the hierarchy in the data, if it exists For a Clustered data set: Level 3 represents clusters of level 2 units / clusters of clusters (e.g., schools, which are clusters of classrooms) Level 3 specific covariates (e.g., neighborhood characteristics of school, such as household poverty in school neighborhood) Other examples of three-level clustered data? ©CSCAR, 2010: Proc Mixed

105 Predicting Random Effects BLUPS
Even though classrooms and schools are a random sample from some larger population, we may still want to estimate the classroom or school-specific means. To identify classrooms with poor math scores that might be candidates for an intervention. Identify, post-hoc, attributes of poorly performing schools. Classroom is an element of the Z vector. We can interpret ui as the “effect” of classroom i (random classroom effect, or random intercept per classroom). ©CSCAR, 2010: Proc Mixed

106 Predicting Random Effects BLUPS (Cont)
If V is known, the estimates of u are Best Linear Unbiased Predictors (BLUPs). When V is unknown, the estimates of u are referred to as Empirical Best Linear Unbiased Predictors (EBLUPs). Use solution option added to the random statement to produce EBLUPs for the u’s (predictions for each ui, i.e., the effect of each classroom): random int / subject=classid solution; or random classid / solution; ©CSCAR, 2010: Proc Mixed

107 Predicting Random Effects BLUPS (Cont)
The ui are the conditional expectations of the random effects, given the observed response values, yi We predict, rather than estimate, the values of the EBLUPS Recall that the assumed distribution of the random effects is normal, and we can check that assumption ©CSCAR, 2010: Proc Mixed

108 Three-Level Clustered Data Classroom Data
Lab Example 3 Three-Level Clustered Data Classroom Data

109 Ex 3: Summary We can fit models for clustered data with three or more levels We can check the distribution of the Eblups (predicted random effects) to look for outliers (schools or classrooms that are doing particularly well or poorly) ©CSCAR, 2010: Proc Mixed

110 Repeated Measures Data
Dependent variable measured multiple times for each unit of analysis Repeated measures factor may be time or other observational or experimental factor May be more than one repeated measures factor Examples Regions/Treatments within rat brain Insulin levels measured at various time points within patient after injection of drug Observations made for same subject may be correlated ©CSCAR, 2010: Proc Mixed

111 Repeated Measures Data Structure
Level 2 (Units of Analysis) Brain Region 2 Region 3 Rat 1 Region 1 Rat 2… Level 1 (Repeated Measures) Dependent variable is measured more than once for each rat ©CSCAR, 2010: Proc Mixed

112 Data Setup: Repeated Measures / Longitudinal Data
Data are in long format One row for each time point for each subject Each row contains Time-varying information Dependent variable Time-varying covariates to be included in the model Time-invariant information Unit / subject ID Time-invariant covariates to be included in the model Values repeated for each row for a subject All rows with complete data will be used in fitting the model Number of rows per subject can vary ©CSCAR, 2010: Proc Mixed

113 Recall: Form of the Ri Matrix
Variance-covariance matrix of the residuals Many different possible structures for R ©CSCAR, 2010: Proc Mixed

114 Commonly Used Structures for R
Unstructured type = UN Variance Components type=VC Compound Symmetry type = CS Banded type = UN(2) ©CSCAR, 2010: Proc Mixed

115 More structures for R First-order Autoregressive type = AR(1)
Toeplitz Toeplitz (2) type = Toep type = Toep(2) Heterogeneous Compound Symmetry type = CSH Heterogeneous 1st-order Heterogeneous Toeplitz Autoregressive type = Toeph type = ARH(1) ©CSCAR, 2010: Proc Mixed

116 Model Fit: Akaike Information Criteria
SAS calculates the AIC based on the (ML or REML) log likelihood The penalty is 2p, where in SAS, p represents the total number of parameters being estimated for both fixed and random effects. Can be used to compare any two models fit for the same observations, they need not be nested. Smaller is better. Often used to help chose an appropriate structure for the R matrix ©CSCAR, 2010: Proc Mixed

117 Model Fit: Bayes Information Criterion
BIC applies a greater penalty for models with more parameters than does AIC. The penalty is number of parameters, p, times ln(n), where n is the total number of observations in the data set. Can be used to compare two models for the same observations, need not be nested models. Smaller is better. Often used, in conjunction with AIC to help chose an appropriate structure for R matrix. ©CSCAR, 2010: Proc Mixed

118 Marginal Model vs. LMM LMM uses random effects explicitly to model between-subject variance Subject-specific model Includes D matrix and R matrix Implied marginal model (discussed earlier) Marginal model that results from fitting a LMM The marginal variance-covariance matrix is called V V is derived from D and R Marginal model does not use random effects in its specification Population-averaged model Uses only the R matrix, no random effects, so no D. ©CSCAR, 2010: Proc Mixed

119 A Marginal Model With no random effects
We do not include random effects in this model. Therefore, D is zero Covariances, and hence correlations, among residuals are specified directly through the Ri matrix Vi (the marginal variance-covariance matrix for Yi) = Ri ©CSCAR, 2010: Proc Mixed

120 LSMEANS Lsmeans (least squares means) give estimates of the mean of Y for each level of fixed predictors (that are in the class statement), after adjusting for all other fixed covariates in the model Assumes all groups based on categorical predictors are balanced Assumes continuous covariates are fixed at their mean Post-hoc tests can be carried out on lsmeans, using different adjustments for multiple comparisons, e.g., Bonferroni, Tukey, Dunnett, Scheffe, etc. Slices can be used to get simple effects for interactions ©CSCAR, 2010: Proc Mixed

121 Repeated Measures Design The Rat Brain Data
Lab Example 4 Repeated Measures Design The Rat Brain Data

122 Ex 4: Summary There are many possible ways to fit a model for repeated measures data A LMM with a single random intercept is equivalent to a marginal model with a compound symmetric variance-covariance structure, but only if the between-subject variance is > 0. Post-hoc comparisons can be easily carried out using lsmeans statements Many different post-hoc methods are available in SAS ©CSCAR, 2010: Proc Mixed

123 Types of Data: Longitudinal Data
Dependent variable measured multiple times for each unit of analysis Repeated measures factor is time May be over an extended period of time (e.g. years) Examples Autistic children measured at different ages Observations made on same child may be correlated ©CSCAR, 2010: Proc Mixed

124 (Subjects: Units of Analysis)
Longitudinal Data Structure Level 2 (Subjects: Units of Analysis) Child ID 1 Child ID 2 Level 1 (Repeated Measures) Age 2 years Age 5 years Age 2 years Age 3 years Age 9 years Dependent variable is measured more than once for each child Number of measurements does not need to be equal for all subjects Spacing of intervals not required to be equal for all measurement times Measurement times do not have to be the same for all subjects ©CSCAR, 2010: Proc Mixed

125 Missing Data Assume data are Missing at Random (MAR)
Probability of having missing data on a given variable may depend on other observed information Does not depend on the data that would have been observed, but is missing Include in model other covariates that are predictive of missingness ©CSCAR, 2010: Proc Mixed

126 Random Coefficients Models for Longitudinal Data The Autism Data
Lab Example 5 Random Coefficients Models for Longitudinal Data The Autism Data

127 Ex 5: Summary Random coefficient models can be used to model both the trajectory of change and the variance of the trajectories Variance of random intercepts may not be estimable in all situations If a problem comes up, it is best to investigate it thoroughly ©CSCAR, 2010: Proc Mixed

128 References I Pinheiro, J. C. and Bates, D. M., Mixed-Effects Models in S and S-PLUS, Springer-Verlag, Berlin, 2000. Laird, N.M. and Ware, J.H., Random-effects models for longitudinal data. Biometrics, 38, 963, 1982. Oti, R., Anderson, D., and Lord, C. (submitted) Social trajectories among individuals with autism spectrum disorders, Journal of Developmental Psychopathology. ©CSCAR, 2010: Proc Mixed

129 References II West, Brady T., Welch, Kathleen B., Galecki, Andrzej T., Linear Mixed Models: A Practical Guide Using Statistical Software, Chapman & Hall/CRC, 2006. Verbeke, G. and Molenberghs, G., Linear Mixed Models for Longitudinal Data, Springer, 2000. Littell, R.C., Milliken, G.A., Stroup, W.W., Wolfinger, R.D., Schabenberger, O., SAS for Mixed Models, 2nd Edition, Cary, NC: SAS Institute Inc. ©CSCAR, 2010: Proc Mixed

130 References III Little, R.J.A., and Rubin, D.B., Statistical Analysis with Missing Data: 2nd Edition, Wiley, 2002. ©CSCAR, 2010: Proc Mixed


Download ppt "Introduction to SAS® Proc Mixed"

Similar presentations


Ads by Google